website/blog/on-databases-docker-and-nix.org

80 lines
5.6 KiB
Org Mode
Raw Normal View History

#+TITLE: On Docker Databases and Nixos
While learning Hashicorp Nomad+Vault+Consul, I decided that I'd convert all the Docker containers I use currently,
into their Nix-ified forms. In other words, I'd rewrite the ones I had, but I'd based them on NixOS, a truly
declarative enviroment, unlike /ehm/ all the other base images... Well, I didn't realize how *hard* it is to
"dockerize" databases, databases are inherently programs, which deal almost exclusively with state, as opposed to Nix
and Docker, which are both declarative systems (one of them is trying and failing really hard).
** Configuration
Configuration is a rather big part of what systems administrator(DevOps engineer for the cool kids) does, one must
correctly configure a program, most of the time dynamically. And because this is such an important thing, it baffles
me, why 90% of all containers primarily use and support environment variables. I get that it's a convenient way to do
it, it's simple widely supported, uniform, all around great, but *really* cumbersome. Say your config file looks like
this:
#+BEGIN_SRC conf
# stripped down Gitea configuration, I kept the parts that nicely illustrate my point
[server]
APP_DATA_PATH = /data/gitea
ROOT_URL = https://gitea.redalder.org/
DOMAIN = gitea.redalder.org
[database]
DB_TYPE = postgres
HOST = database-postgres
NAME = gitea
#+END_SRC
The config has two sections, =server= and =database=, each of these sections has =n= key-value pairs. This has
structure, it has multiple layers and configuration files can get much, much more complex than that. Now, for the
sake of argument, let's imagine that we want to "environment-ize" this config. The natural, and frankly only way to
do this is to essentially flatten the config file, so we'd get something like this:
#+BEGIN_SRC conf
SERVER_APP_DATA_PATH="/data/gitea"
SERVER_ROOT_URL="https://gitea.redalder.org/"
SERVER_DOMAIN="gitea.redalder.org"
DATABASE_DB_TYPE="postgres"
DATABASE_HOST="database-postgres"
DATABASE_NAME="gitea"
#+END_SRC
Now, you might think that this is completely fine, even reasonable, but let me explain to you why that is horrible.
First of, there is no imposed structure and structure is always good, nothing is preventing you from mixing the
=server= and =database= sections and while a "good" admin should not do that, it's best if they don't even have the
ability. Next depending on the parser, which parses this "configuration", you might encounter issues when you leave
out =""=, some are better at this than others, but once again, it's an implicit rule, which is bad. I hope you're
starting to understand the bigger picture, implicitness is *bad*, period. It introduces unnecessary mistakes that
could have been avoided if just the computer yelled at you. So why not give it the option to do that?
** Configuration and Databases
Now, it's finally time to combine databases with configuration. What we get is an all out war between immutable,
declarative environments holding stateful and infinitely changing data. You must make sure that your configuration
gets applied only at first start, so you must keep state yourself! In a Docker container! Madness! Then comes the
joy of updating the configuration. Say you give your user the option to specify the default authentication method
([[https://hub.docker.com/_/postgres][PostgreSQL]]), the user specifies that they want =scram-sha-256=, that's nice and all, so you apply it, but *only* on
the first boot. Why? Because now that the value is in the config file, if the user changed it, you'd have to figure
out *if* they changed it and then update the configuration file and that's really hard. The user might have gone into
where they store the state for PostgreSQL and manually changed the config file, they might have even completely
deleted your configuration and replaced it with their own? What should you do? Most Docker containers just take the
easy way out and do as PostgreSQL does and I don't blame them, there is nothing really that you can do.
** Nix - The Solution?
Similar to some good literature, this rant has gone full circle. We're back at the start, back on the topic of
Nix. How can Nix save us? By removing unnecessary state. The mutable configuration file? Gone, it's immutable
now. Not knowing whether a setting changed? Poof, gone too, Nix is fully declarative, which means it identifies
*everything* by a sha256 hash in addition to its developer configured name. Nix also serves as a single point of
truth, which means even if the user modifies the config files, they will be overwritten before they are used
again. This makes mix ups are impossible. Messy and flat "configuration" files? Solved too, the Nix expression language can be
as flat or as deep as you need it to be, you can create complex APIs with functions and all that jazz. Basically Nix
is awesome!
** Conclusion
The take-away from this rant, is that the best course of action is to figure out either how to completely replace
Docker and all the other container runtimes with something based on Nix, but since I'm a realist, I propose another
possible solution. We must get Nix to nicely work with Docker, so instead of clumsy environment variables, you'd
write your configuration in the form of Nix expressions and build a new docker image based on one common base. This
ensure that all configuration would be properly hashed and declarative, while allowing for much more complex config
files than environment variables or even templates.