Delinearized Rollouts

Posted


This post is largely implied by the previous two, but I wanted to take a bit of time and tie everything together. Since changes are no longer being committed at a set time they no longer need to be comitted in a fixed order. By combining unordered patches with testing changes on small fractions of production traffic this creates a very powerful rollout system.

Example

Imagine that each patch goes through the following process:

  1. Quick review-time tests are run.
  2. Human reviewers approve the patch.
  3. Integration tests run against the patch (likely in a batch).
  4. Patch is built and tested against a small slice of production traffic.
  5. Metrics are monitored until probability of a regression is below a chosen confidence interval.
  6. Periodically, patches that don’t cause regressions are built into a rollup release and deployed to production.
  7. If the rollup is regression-free it is rolled out to an increasing amount of production traffic.
  8. The rollup is added to the main development branch.

The key fact is that step 6 doesn’t just take all pending patches. It only takes the one that have high confidence of being “good”. The patches not selected are also not necessarily rejected. Patches with a high confidence of being “bad” will be rejected and the author will need to take action, but patches without high confidence in either direction will be allowed to run until the required confidence is met in one direction or the other.

Benefits

On top of the benefits of multi-version rollouts, the main benefit of delinearized rollouts is that the reduce head-of-line blocking. If Change-A gets “unlucky” and throws a bunch of errors (maybe there was an unrelated database blip, or a few requests hit an existing bug) it can be held in evaluation for longer. It doesn’t need to be rolled back, it can just be tested until a verdict is reached. In this extended testing other changes don’t need to be blocked. They can be selected for the rollup release ahead of Change-A. Change-A can then land in the next rollup if it is found to be good.

Downsides

The main downside here is complexity of the tooling. It requires moving from a simple set of promotions to some sort of promotion manager. However, it is still conceptually quite simple. For the rollup steps you basically just evaluate each in-flight patch and include the ones that are above a given confidence of being “good”. Then you also reject any change that is below a (different) confidence level. You also want some status reporting so that authors can track how a particular change is doing in the evaluation.

It is also slightly more complex for authors to understand, but since each patch still progresses linearly though the stages I don’t think it adds much confusion. The dashboard just needs to tell them where their patch is in the submission process and the current confidence level.