Typed Config Languages

Posted

Configuration languages are a serialization format used as an interface between humans (usually technical ones) and computers.

One of the most popular configuration languages is YAML. It looks something like this:

allowed-countries:
- ca
- us
user-overrides:
  kevincox:
    admin: true
  angry:
    banned-until: 2022-02-01

Overall I think the core YAML syntax is OK. It is fairly easy to read and edit, even if you have never seen it before. One defining feature of YAML is that you know the types just by looking at the syntax. Just this snippet is enough to tell you how every value will be parsed.

The biggest problem I have with YAML is that it tries too hard to be convenient. For example imagine that my imaginary startup just got all the paperwork done and we are ready to launch in Norway. Easy enough, we update our list:

allowed-countries:
- ca
- no # New!
- us

This is a well known problem. While ca and us are strings. In YAML no is a boolean. In JSON this list would be ["ca", false, "us"]. Oops.

This isn’t the only place that this can happen. See some other examples:

string_vs_number: [1.2.3, 1.2]
string_vs_null: [Bull, Null]

The problem here is that the config language doesn’t know what type you expect. The language designers wanted to make things minimal and clean so everything defaults to a string, but some inputs are parsed as other types. For example 1.2 is a number and null is a null value. In theory this isn’t too complicated, but unless you have memorized the YAML spec and are paying attention it is an easy mistake to miss.

Intro to Typed Configs

What if we didn’t need to make this tradeoff. What if the language knew what type it was expecting?

I made a language for this. It is called Simple Config (I know, awful name). The current implementation gets its schema from serde but there is no reason that it couldn’t use a separate schema file in a dynamic language or for cross-language use.

Knowing the type you are parsing allows a lot of language rules to be simplified and it makes the result much more predictable.

For example if I define the following:

#[derive(serde::Deserialize)]
struct Example {
	countries: Vec<String>,
	bools: Vec<bool>,
}

I can then do:

countries:
	ca
	no
	us
bools:
	yes
	no
	false

With no ambiguity. (Actually Simple Config only supports true and false because I like to keep things simple, but it would be a trivial change to make this work).

If you wanted to parse version numbers you can have Vec<String> or even better Vec<Version> and there is no risk of 1 or 2.3 being parsed as a number. The user doesn’t need to remember if they need quotes or not, it just works.

Downsides

Tooling Needs to Know the Schema

The biggest downside is that you need to know the schema to understand the file. For example considering the following:

#[drive(serde::Deserialize)]
struct Example {
	map: std::collections::HashMap<String, String>,
	string: String,
}
map:
	a: 1
	b: 2
string:
	a: 1
	b: 2

The string field is a multi-line string (contents "a: 1\nb: 2"). You can see how this makes even simple formatters impossible. If I had written a : 1 a formatter may want to correct it, however without knowing the schema it doesn’t know where it can do that. (The fix would be correct for map but not string.)

Confusing to Humans

Just as it can confuse automatic tools it is easy to confuse humans as well. A user may think that null is a null value where it is actually the string "null". Or if a human sees something like:

some-low-level-setting-that-no-one-understands: 3

They may attempt to help the situation by adding a comment.

some-low-level-setting-that-no-one-understands:
	# This value is 3 because it is my lucky number.
	3

However if that was actually a string Simple Config will treat that comment as part of the string.

Some of these can be explained away as Simple Config design decisions but I think the point still stands. If a user sees 30min they may assume that the field is a Duration and make some edits that are wrong if this is a string (that may be passed verbatim to another system that supports a different set of suffixes and formatting).

Conclusion

I think this is an interesting idea and I would like to see it taken further. Statically typed programming languages are catching on so why don’t we extend this typing to our config files? Lots of common issues I see in config files could be avoided by doing this and using different logic to parse different types allows using much more flexible parsing rules because you don’t need to work about accidental type switches. I’ve been using Simple Config for some of my projects for a while and am very happy with the result. I’m sure typed config languages could be even better so I would love to see future iterations.