How to Keep Your Master Branch Green with Git

Posted on 2020-08-15

Most often I see a CI setup that leads to relatively common failed builds on the master branch. The setup I see the most often is:

Prepare a patch/change/PR/MR.
Run tests on that change. (And get reviews)
Apply the change to master.
Run the tests again on master.

This probably got adopted as the defacto standard because it is very easy to implement. Just run CI on every commit to the repo. However it makes it fairly easy for logical “merge conflicts” to get into master, which then causes the tests to fail in step 4, and everyone who pulls that branch will see unrelated issues when they run tests on their changes.

Latest-Green

The solution is quite simple, run the tests on the merged result before applying the changes to master. A simple solution replaces steps 3 and 4 with:

Apply the changes to staging.
When staging is green, fast-forward master to staging.

This is already a big improvement, master is now the “latest green”, and developers will not be blocked by broken code. However once staging goes red master will stop being updated until someone fixes it. This means that it still delays changes being published, both to production and to other developers for an extended period of time.

Auto-Revert

To make this strategy really shine you can apply auto-rollback to the staging branch. The exact policy here can be complicated, however here is a simple policy to start from.

If CI fails in staging remove that patch, and re-queue all the patches behind of it.

This assumes that the commit that failed was the problem, and optimistically assumes that the other commits are good.

Downsides

The major downside is that it can take a lot of CI resources or add a lot of latency to merges if commits are failing often. This is because you basically have two approaches to running CI when multiple patches are queued.

Optimistic: Merge all of the patches into staging and start CI immediately.
Pessimistic: Don’t start CI until the previous patch has passed.

If your CI is fast either approach will work. However if your time between patches is shorter on average than your CI latency your queue will build using the pessimistic strategy. The solution is to use the optimistic strategy which is optimal if you have no failures. However it will waste a lot of CI resources if a patch fails because all of the jobs behind it will (likely) fail as well, then be retried with a different base.

More complex strategies can be employed as well such as limiting the number of queued patches that CI runs on, if you base this number based on your expected failure rate it can be very effective.

Support

GitLab non-free editions support this
Bors supports this on GitHub.