Service management with Nix.

Posted

I have often wondered how I would manage a cluster of machines. I have taken a lot of inspiration from Google's solution of treating them as a giant pool of resources, no machines are reserved for any particular task. This adds a level of abstraction so that you can focus on what your service needs in order to run instead of worrying about where it should run. I also believe that this will provide better utilization as an automated scheduler can pack services onto machines very effectively where as a human would be inclined to use a simple algorithm and just get more computers as they appear to get "full".

Another goal of my infrastructure is to avoid virtualization. This is just adds complexity and overhead into the problem. For some models this makes sense but for a single organization users and groups should provide enough access control to create a secure system.

My early visions were pretty simple. They were basically using a standard packages manager such as apt or pacman to download required packages from a custom repository. This means that dependencies could be pulled down automatically and shared dependencies were only downloaded and installed once. While this basic idea is good it has a major problems when it comes to versions. If you wanted to update a package that was used by multiple services you would run the risk that one of them could be incompatible with the newer version. One work around would be to give each version of the package a new name which would work but starts to get messy.

Some people have used docker to solve this problem. While docker does solve the dependency problem it also adds a bunch of overheads that I would rather avoid. Running multiple docker containers often means having many copies of common libraries, both on the disk and in cache. Also the isolation makes introspection of services difficult.

This is where Nix comes in. I have used Nix to implement my ideal infrastructure. It is composed of two main layers to solve two different problems.

Infrastructure layer.

The first layer is the infrastructure layer. This is a core collection of programs and services that is identical across all machines. This runs the cluster management software as well as global services such as log collection and forwarding as well as providing tools for system administrators and other users who might log into a machine to debug a problem. In my current setup this consists of the following things:

These provide a base on which to run the services. I manage this layer via nixops which provides an easy way to manage these static servers, however this layer could effectively be any distribution as the application layer only depends on Nix and packages in the Nix store.

Application Layer

The application layer is where the business happens. These services are not static but instead scheduled by a cluster manager across the available machines. Currently I am using fleet as it has a small resource footprint and I only need very basic scheduling. It works well for my simple situation however if I was going to expand my cluster I would switch to using Mesos as it provides much more advanced features.

Fleet and Mesos are excellent building blocks but they still need a bit of magic to create a full management solution. However Nix provides all that is needed for this glue, allowing me to run services that are largely independent from the underlying system. The basic premise is that where ever the service gets scheduled, it downloads all of it's dependencies and starts to run. Using Nix we can ensure that there will be no conflicts between "packages" as they are all stored under unique names in the Nix store as well as "de-duplicating" dependencies that makes sure that we don't re-download dependencies already present on the system and just use the existing version.

I will show you a quick example of how I got this to work using Nix. The service I will be showing here is etcd-cloudflare-dns, a ruby script I wrote that continuously updates my CloudFlare DNS settings to match which services are running in etcd. The details of the script itself aren't important, but it does require a couple of dependencies to run (ruby) and I always want exactly one instance running.

The first step is building the package that will be used to distribute the service. This is done using the default.nix expression in the project. This contains instructions for packaging the script, and through the import <nixpkgs> also instructions for building all of it's dependencies. This package can be built by running nix-build in the top level of the repo which will build a package and create a symlink called result in the current directory which points at the result in the Nix store.

I wrote a script called b2-nix-cache which will build the expression and upload the new Nix archive files to a Backblaze b2 bucket which I use as a Nix binary cache. To use it I simply run ~/p/b2-nix-cache/upload.sh my-nix-cache-bucket /tmp/nix-cache-key in the directory of the project I wish to upload.

Now that I have all of the dependencies uploaded I schedule the service to start using Fleet. I generate the service file specifying the exact path of the binary in the Nix store and use a service that already exists on all of my machines to "realize" all of the required paths and pin them so that they don't get garbage collected while the service is running. This is what the generation script looks like.

#! /bin/bash

set -e

# Get the store path from the result of the build.
pkg="$(readlink -f result)"

# Generate the service description.
cat >etcd-cloudflare-dns.service <<END
[Unit]
Description=Keep CloudFlare DNS up to date with values in etcd.
After=nix-expr@${pkg##*/}.service
Requires=nix-expr@${pkg##*/}.service

[Service]
Environment=DOMAIN=kevincox.ca
Environment=GEM_HOME=$pkg/gems
EnvironmentFile=/etc/kevincox-environment
EnvironmentFile=/run/keys/cloudflare

User=etcd-cloudflare-dns
ExecStart=$pkg/bin/etcd-cloudflare-dns
Restart=always
END

# Remove the old service and start the new one.
fleetctl "$@" destroy etcd-cloudflare-dns.service || true
fleetctl "$@" start etcd-cloudflare-dns.service

This script is incredibly simple, it just inserts the package path into the right places and uploads the service to fleet. The interesting part is likely nix-expr@${}.service which is a parameterized service that exists on all of my machines (it is put there by the infrastructure layer). It's job is to download the required dependencies and keep them around as long as the service is running. It consists of the simple systemd service below.

[Unit]
Description=Install and keep installed the specified path.
StopWhenUnneeded=true

[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=${pkgs.nix}/bin/nix-store -r '/nix/store/%i' \
             --add-root '/nix/var/nix/gcroots/tmp/%n'
ExecStop=${pkgs.coreutils}/bin/rm -v '/nix/var/nix/gcroots/tmp/%n'

This service consists of just a start command and a stop command. The start command "realizes" a store path, which is Nix terms for "make it exist". Since I have both https://cache.nixos.org and my Backblaze bucket configured as binary caches the required derivations will be downloaded from both of them (preferring https://cache.nixos.org). Once that is done the nix-expr@.service will be considered "running" and etcd-cloudflare-dns will be allowed to start.

Once etcd-cloudflare-dns is done running, either because it has been stopped or it has failed the associated nix-expr@.service will be unneeded (assuming no other service is using the same derivation) and since StopWhenUnneeded is true it will be stopped, removing the gc root and allowing the associated store paths to be freed the next time the garbage collector runs.

It is also important to note that because /nix/var/nix/gcroots/tmp/ is cleared on each boot power failures or other unexpected stops won't slowly cause your drive to fill. While is is possible that systemd failures could leak gc roots I am assuming that they are rare enough that they won't cause much of an issue between reboots.

Downsides

The major downside of this approach is I currently don't have a good secret management solution. Currently every secret I want to use has to be deployed to all machines and it is accessed using a well known path. While the secret is pretected by filesystem permissions and not written to disk it would be nice to be able to deploy new secrets without modifying the infrastructure layer configuration as well as not having every secret on every machine. I don't want to put the secrets into the Nix store because that is world readable and fleet services decriptions doens't seem like the right solution either. For now I am satisfied with my solution however I have thought of a couple more methods including running a secret management service or storing a single encryption key on each node and putting encrypted secrets in the Nix store (downloadable from my binary cache) however none of thoses seems great to me.

Another downside to this approach is the difficulty of preforming security updates. Since the services themselves specify the exact versions of their dependencies there is no way to globally update a package. This means that in a situation were an update is required you would have to get the maintainer of every affected service to release a new version instead of having the option to just update the packaged on every system and have them take effect. However this is a well known tradeoff to having exact dependcies and not unique to this solution. It would be nice to have a script that scans the Nix stores of all machines and reports any vulnrable packages it finds. Tracing these packages to the users would be very easy as Nix maintains a full store path dependency graph.

Conclusion

This solution allows seperating the infrastructure layer from the applications so that changes in each are largely independent (expect for required coupling). This allows abstracting away the actual hardware of the cluster and instead focusing on what your services actually need in order to run. This trivializes deployment and dependency management and when combined with NixOS provides a very robust and pure archetecture.