How to Keep Your Master Branch Green with Git
Posted
Most often I see a CI setup that leads to relatively common failed builds on the master
branch. The setup I see the most often is:
- Prepare a patch/change/PR/MR.
- Run tests on that change. (And get reviews)
- Apply the change to
master
. - Run the tests again on
master
.
This probably got adopted as the defacto standard because it is very easy to implement. Just run CI on every commit to the repo. However it makes it fairly easy for logical “merge conflicts” to get into master
, which then causes the tests to fail in step 4, and everyone who pulls that branch will see unrelated issues when they run tests on their changes.
Latest-Green
The solution is quite simple, run the tests on the merged result before applying the changes to master. A simple solution replaces steps 3 and 4 with:
- Apply the changes to
staging
. - When
staging
is green, fast-forwardmaster
tostaging
.
This is already a big improvement, master
is now the “latest green”, and developers will not be blocked by broken code. However once staging
goes red master
will stop being updated until someone fixes it. This means that it still delays changes being published, both to production and to other developers for an extended period of time.
Auto-Revert
To make this strategy really shine you can apply auto-rollback to the staging
branch. The exact policy here can be complicated, however here is a simple policy to start from.
If CI fails in
staging
remove that patch, and re-queue all the patches behind of it.
This assumes that the commit that failed was the problem, and optimistically assumes that the other commits are good.
Downsides
The major downside is that it can take a lot of CI resources or add a lot of latency to merges if commits are failing often. This is because you basically have two approaches to running CI when multiple patches are queued.
- Optimistic: Merge all of the patches into
staging
and start CI immediately. - Pessimistic: Don’t start CI until the previous patch has passed.
If your CI is fast either approach will work. However if your time between patches is shorter on average than your CI latency your queue will build using the pessimistic strategy. The solution is to use the optimistic strategy which is optimal if you have no failures. However it will waste a lot of CI resources if a patch fails because all of the jobs behind it will (likely) fail as well, then be retried with a different base.
More complex strategies can be employed as well such as limiting the number of queued patches that CI runs on, if you base this number based on your expected failure rate it can be very effective.
Support
- GitLab non-free editions support this
- Bors supports this on GitHub.