How to Keep Your Master Branch Green with Git
Most often I see a CI setup that leads to relatively common failed builds on the
master branch. The setup I see the most often is:
- Prepare a patch/change/PR/MR.
- Run tests on that change. (And get reviews)
- Apply the change to
- Run the tests again on
This probably got adopted as the defacto standard because it is very easy to implement. Just run CI on every commit to the repo. However it makes it fairly easy for logical “merge conflicts” to get into
master, which then causes the tests to fail in step 4, and everyone who pulls that branch will see unrelated issues when they run tests on their changes.
The solution is quite simple, run the tests on the merged result before applying the changes to master. A simple solution replaces steps 3 and 4 with:
- Apply the changes to
stagingis green, fast-forward
This is already a big improvement,
master is now the “latest green”, and developers will not be blocked by broken code. However once
staging goes red
master will stop being updated until someone fixes it. This means that it still delays changes being published, both to production and to other developers for an extended period of time.
To make this strategy really shine you can apply auto-rollback to the
staging branch. The exact policy here can be complicated, however here is a simple policy to start from.
If CI fails in
stagingremove that patch, and re-queue all the patches behind of it.
This assumes that the commit that failed was the problem, and optimistically assumes that the other commits are good.
The major downside is that it can take a lot of CI resources or add a lot of latency to merges if commits are failing often. This is because you basically have two approaches to running CI when multiple patches are queued.
- Optimistic: Merge all of the patches into
stagingand start CI immediately.
- Pessimistic: Don’t start CI until the previous patch has passed.
If your CI is fast either approach will work. However if your time between patches is shorter on average than your CI latency your queue will build using the pessimistic strategy. The solution is to use the optimistic strategy which is optimal if you have no failures. However it will waste a lot of CI resources if a patch fails because all of the jobs behind it will (likely) fail as well, then be retried with a different base.
More complex strategies can be employed as well such as limiting the number of queued patches that CI runs on, if you base this number based on your expected failure rate it can be very effective.