Since, in the absence of an explicit message to the contrary, the internet magnifies vitriol and minimizes nuance, let me begin by saying that I think Eelco Dolstra & al.’s Nix package manager is a brilliant accomplishment. The idea of throwing the traditional Unix file system hierarchy to the wind and storing packages by cryptographic hashes is genious, both theoretically beautiful and practically consequential, and I’ve been captivated by the notion ever since I heard about it. This is a critique of some aspects of Nix, but it is only offered out of appreciation for the insight of the core idea. I should also say that my programming skills are amateur. I am trained as a philosopher, not a programmer; I’ve only taken one class in CS, and that was on complexity theory. I don’t offer this critique because I think I should be telling Dolstra & co. what they should be doing (clearly they’ve done very well for themselves), but simply to explain why I am building my own tool like Nix, and what I want to explore in the design space that Nix is pioneering.
That said, I do think there are numerous aspects of Nix’s design where making a different design decision than Nix did would lead to a system that is more elegant and more productive for users. I think this is true across the breadth of functionality that Nix provides, and I will try to provide my critiques in conceptual order, from building a package to working with installed packages.
The fundamental idea of Nix is that each package should be principally identified by a cryptographic hash. Nix provides two ways of producing a hash to satisfy this requirement. One option is that you can hash the entire contents of the package in a well-defined and reproducible way. The other is that you can hash all of the inputs needed to build a package (and Nix provides a way to assure that you have not overlooked any inputs).
What I find backwards is that the second method is the default. The vast majority of packages in Nix are associated with hashes derived from their build inputs, not their contents. If you want to have a package hash be based on the package’s own contents, you have to enable special package config flags. I think it should be the other way around. In this era of Reproducible Builds, nearly all software is a reproducible function of build sources, build options, and compiler. Some software is still not reproducible, of course, so there’s a need to have the ability to identify a package with a hash from its inputs, but this should be the case that requires special treatment. Deriving a package’s identification hash from its contents should be the default.
This is not a mere aesthetic concern. There is a serious problem lurking here
that has to do with the provenance of compilers. This is best illustrated by
example. Suppose we had a Nix package of clang 11.0.0
, itself built by
clang 11.0.0
. Call this Nix package clang-3XHCNGQ6HUCUZLRUSYQ4DGG5KI
. Now
suppose we used clang-3XHCNGQ6HUCUZLRUSYQ4DGG5KI
to build clang 11.0.0
again.
Despite being the same program, because this new clang binary was compiled by
a different compiler, it will have a different identifying hash, let’s say
clang-C7YPJOUKL4JBH6WKLENVROSSU4
. Now we have, in the eyes of Nix, two distinct
compilers, despite the fact that they are both clang 11.0.0
, and should in
principle produce identical outputs on identical inputs. This division then
propogates to the compilers’ outputs. Let’s say they both compile zlib, and
let’s say, ex hypothesi, that the two outputs are bitwise identical.
Nevertheless, Nix will, by default, assign two distinct hashes to these builds,
and we’ll have zlib-TZNZJCW6TQ6UDXTGC63I65U6KU
and
zlib-BZHBMDVSTW3ARWQ3JEXL2UIDXY
, which, in the eyes of Nix, despite being
bitwise identical, are not the same package. An executable binary that depends
on one cannot be satisfied by the other, and so if you have a mix of binaries
that depends on each, then you must download both.
Nix mitigates against this by close scrutiny of their Nixpkgs collection, but look at what’s happened: cryptographic hashes are frequently associated with distributed systems, but here the work of producing packages had been centralized into a single Git repo. This is a consequence that I’ll refer back to later.
Fundamentally, a Nix package is built by calling a function with 5 arguments. Those are: the system a package is for, the name of the package, the sources for building the package, the command to build the package, and the arguments to the build command. (There are also many rarer, optional arguments, but that’s not immeditately relevant.) Many newcomers to Nix have remarked on the Nix programming language, a domain-specific functional language used to write Nix build scripts, finding it a hurdle (not insurmountable, but a hurdle nonetheless). The primary distingishing feature of Guix is that it uses Guile Scheme instead of the custom Nix language. But none of this is necessary. The whole language boils down to that single function call with 5 arguments. You could serialize those arguments in anything – JSON, or TOML, or XML, whatever. That would, I think, make the whole approach Nix is developing much more accessible.
Nix builds its packages in a sandbox. How does it construct that sandbox? Well, it makes a temporary build directory, cd’s to it, clears the environment variables, and then:
NIX_BUILD_TOP contains the path of the temporary directory for this build.
Also, TMPDIR, TEMPDIR, TMP, TEMP are set to point to the temporary directory. This is to prevent the builder from accidentally writing temporary files anywhere else. Doing so might cause interference by other processes.
PATH is set to /path-not-set to prevent shells from initialising it to their built-in default value.
HOME is set to /homeless-shelter to prevent programs from using /etc/passwd or the like to find the user’s home directory, which could cause impurity. Usually, when HOME is set, it is used as the location of the home directory, even if it points to a non-existent path.
NIX_STORE is set to the path of the top-level Nix store directory (typically, /nix/store).
Certainly, that’s an effort to provide a sandbox, and I’ve sure it works well enough, but that’s not exactly thorough. Through Linux namespaces, bind mounts, and chroot, one could have a much more tightly sealed sandbox though.
When a use installs a Nix package, the package is then provided in a more typical Unix location via symbolic link. As with the sandbox, this works, but it’s a missed opportunity to further the natural evolution of Unix by using mount points. At the end of the day, a symbolic link is some bytes on a disk containing the target of the link. Using symbolic links is in keeping with the more conventional view that the root directory of a system represents a particular disk, the contents of which are visible to all processes and limited by file permissions, and that it’s only rarely that a mountpoint or a chroot leads to a FS structure that is not the reflection of a disk.
By contrast, Unix has, throughout its history, moving away from that model, towards a model where the system FS hierarchy doesn’t represent any disk at all, and can appear quite differently from process to process. This model was taken to its extreme in Plan 9, and is also being used in Google’s Fuschia OS, but Linux also has the ability to do such things with its namespaces.
Nix provides a great way for application binaries to be shipped with the (hopefully) known-good libraries that they were built with. However, there was already just as good a way for application binaries to be shipped with the libraries they were built with: static linking.
Especially now that standards like semantic versioning exist and are widely used, I think an improvement in the Nix-related space would be for binaries to have automatic compatible upgrades done to their dependencies (without changing the hash of the application binary itself), while keeping a known-good version in store to fall back upon.
If you use NixOs, then the state of your entire system in centralized in a single file (or optionally split across several files, but with clear import statements). This is great, but the system config file is still just a text file. If you’re going to have all the state in one place, why not go all the way and turn that one place into a database, so the system state can be manipulated with proper types: booleans, ints, etc., instead of having everything string typed.
This is, from what I understand, what the Windows Registry is. I’ve never developed on Windows so I don’t have any first-hand experience. I get the impression that people dislike it. But it’s not clear to me if the problem is the concept of a central system registry, or the particular implementation of the Windows Registry. The idea seems to have some support from Unix people though; here’s Theodore Ts’o supporting the idea of databases over text files for configuration. And there’s a tool for doing something like the Windows Registry for Linux: LibElektra. I don’t know if this would work out in the end, but it’s something that I think would be cool to try.
Again, I don’t think I have any standing to dictate to Nix how to build their thing, nor am I saying that Nix couldn’t change to do anything above. I only write this to explain why I am making my own tool. If you are interested, join me.