Guix Introduction Part 2: What is Guix? A closer look
Contents
This is the second part of a brief introduction to the Guix functional package manager, and how it could be used to manage dependencies of projects, much like virtual environments for Python, but with much larger scope.
This is written in a way that I wish Guix was introduced to me back when I first learned it. This time we take a closer look at Guix, and briefly introduce the various main components of Guix.
Promised Features of Guix
We first recall the promised features of Guix as a functional package manager:
- Guix is in fact a package building system, being a good package manager is a side effect
- designed to properly address the “dependency hell” problem
- aims at reproducible builds
- the exact same set of packages could be reproduced at a later time or on a different machine (of the same architecture), by just using two small text files
- allows atomic installation/uninstallation/upgrade of packages
- each action may involve one or more packages, and is a transaction, i.e. either it succeeds as a whole, or does not succeed, there would not be half installed packages
- allows easy rollback of install/uninstall/upgrade actions
- each transaction results in new generation, which is like a snapshot of the current set of packages (in a profile)
- can easily rollback to previous generations
- so there is little fear of accidentally installing/uninstalling/upgrading the wrong packages
- allows different versions of the “same” package to coexist and used by different packages, without conflict
- encourages declarative package management
- a set of packages can be specified in a manifest file, and can be installed in one transaction
- currently only works on GNU/Linux
- can be installed on any GNU/Linux distribution such as Debian, Ubuntu, Arch, etc
- can coexist with the existing package manager of the distribution
- allows each user to manage his/her own packages
- without root privilege
- without interfering other users
- each user can have multiple profiles of packages
- each profile has its own list of generations, and can be rolled back separately
- allows easy creation of isolated environments with designated packages
- useful for per-project dependency management
- Guix system is a GNU/Linux distribution built on top of the Guix package manager
- uses a config file to declaratively specify the whole system, e.g. the system services, user accounts, etc
The Parts of Guix
We introduce the main parts of Guix through a hypothetical Q&A, so that the parts are better motivated, and it is clearer how each part fits into the whole.
Package definition
- Q: We are making a package manager to help us manage
dependencies, where should we start?
- A: Eventually a package needs to be built, we may as well make a package building tool which also manages dependencies.
- Q: Each package has some way of building, e.g. the usual
configure-make or cmake for most C or C++ projects, Python
packages have their usual way of building, R package can be built
by
R CMD build
, so we can have commands for each type of build system to reduce repetitive work, so that for each package we only specify the unique parts and we should be done?-
A:It is a good idea to have commands for different build systems, so we can have
gnu-build-system
(for configure-make builds),cmake-build-system
,python-build-system
,r-build-system
,julia-build-system
,texlive-build-system
,emacs-build-system
,maven-build-system
,linux-module-build-system
,node-build-system
and others. -
A: But packages often have dependencies (which is the whole point of package manager), so we should also specify the dependencies of each package, separately for build-time and run-time dependencies. It is also nice to have a package definition to specify these in a sort of declarative way. We may also add some metadata to the package definition such as package name, version number, a short description.
-
A: An example of package definition is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
(define-public r-jsonlite (package (name "r-jsonlite") (version "1.7.2") (source (origin (method url-fetch) (uri (cran-uri "jsonlite" version)) (sha256 (base32 "1lhzcpz9clwq04i5m6jzkvw9x03pwlqrixv4l9xzchjr8d84nd86")))) (build-system r-build-system) (native-inputs `(("r-knitr" ,r-knitr))) (home-page "https://arxiv.org/abs/1403.2805") (synopsis "Robust, high performance JSON parser and generator for R") (description "The jsonlite package provides a fast JSON parser and generator optimized for statistical data and the web. It offers flexible, robust, high performance tools for working with JSON in R and is particularly powerful for building pipelines and interacting with a web API. In addition to converting JSON data from/to R objects, jsonlite contains functions to stream, validate, and prettify JSON data. The unit tests included with the package verify that all edge cases are encoded and decoded consistently for use with dynamic data in systems and applications.") (license license:expat)))
-
- Q: So with the metadata, we can first ensure the dependencies are
available and build them if necessary, before building our target
package. But how to prevent incorrectly specified dependencies?
- A: To avoid unspecified dependencies, the package can be built in an isolated environment (e.g. a chroot environment) where only the explicitly specified dependencies are available, so that if a needed dependency is not specified, the package will not build successfully, so that we will be aware of the problem at package development time.
- A: Overly specified dependencies is less of a problem, because the target package can still be built if there are extra packages available. So one way is to ignore this problem, and simply let package definition writer add dependencies as needed while testing the build.
- Q: How to name the package in a unique way? Can we use the
package name and the version number, e.g.
r-jsonlite-1.7.2
?- A: Even for the same version of a package, the building parameters (e.g. compiler options) or available dependencies (e.g. optional dependencies, some of which can be turned on or off when building) can still result in different package artifact. If we always use a canonical building parameters and options, then using only the name and version would suffice. But it would be good to have a much finer-grained identity for each exact version of each package, and it can be used for other purposes such as caching.
- A: We can pre-pend some kind of hash calculated from the inputs (or
their hashes) to the build process, including:
- source
- dependencies
- build system and parameters such as compiler options
- environment variables to be defined
- target system type
- where to store the built package
- A: The hash transitively encodes the exact versions of package and all its direct and indirect dependencies.
- A: an example of such a unique name is
2am1s5hqgkzxzbyvcfbhxq72diny117q-r-yaml-2.2.1
Derivation
- Q: Maybe we just calculate the hash of the built artifacts? Oh,
but then we will always need to build the package in order to
calculate the hash, which greatly reduces the usefulness of the
hash?
-
A: Instead of calculating the hash of the final built artifact, we can produce a derivation which has sufficient details to exactly reproduce the package when built. The derivation can be produced efficiently, without building the package. As long as the build process is deterministic, the hash of the derivation is as good as the hash of the final built artifacts in uniquely identifying the exact version of the package.
-
A: an example of derivation is (note that it is in a canonical way without unnecessary whitespace):
1
Derive([("out","/gnu/store/2am1s5hqgkzxzbyvcfbhxq72diny117q-r-yaml-2.2.1","","")],[("/gnu/store/07byz0yy984h3d8mkbsdxml18wp1nac7-make-4.3.drv",["out"]),("/gnu/store/24lms3l4ylxwb7ddrja4iq92syzya7xd-yaml_2.2.1.tar.gz.drv",["out"]),("/gnu/store/5bbi7iwazfrvc9fg8y4fg4lp6j01d3x3-grep-3.4.drv",["out"]),("/gnu/store/8psdnbc4jhcz3k0ghkd9ha5mdm2r4pd0-bzip2-1.0.8.drv",["out"]),("/gnu/store/8vfk6231k05m6ik6k0dfk5lvm8n7822y-gzip-1.10.drv",["out"]),("/gnu/store/9whhajxkjkxb9vwdb0z5ashcmigj81pa-coreutils-8.32.drv",["out"]),("/gnu/store/afj3j7ghv6zm1iq6p3m5dbqsnrriy9ds-bash-minimal-5.0.16.drv",["out"]),("/gnu/store/b5nnbpgkvgdpzgvj67539ylcaqacj90l-guile-3.0.2.drv",["out"]),("/gnu/store/bhs8mjdhm37wk16qg8jzn9fdcgmllj50-diffutils-3.7.drv",["out"]),("/gnu/store/fdmz5blhzfczkpjb9jj6bdbhqlpv3i7l-gcc-7.5.0.drv",["out"]),("/gnu/store/fvi7sqvk9m1w93xaf8565ai7742zqc2i-xz-5.2.4.drv",["out"]),("/gnu/store/h1vn41niaqhm75b4syvl1cg7f9rbzc0z-glibc-2.31.drv",["out","static"]),("/gnu/store/jd1fm999bf0k2vqlgzqbcacbjrqmai11-module-import-compiled.drv",["out"]),("/gnu/store/mkq3s7av2l1vhcxns84k5q3j7r92imxm-patch-2.7.6.drv",["out"]),("/gnu/store/mm8flcvndb2mr53xhf2zilx263s88bf3-findutils-4.7.0.drv",["out"]),("/gnu/store/mv12ligm0jzz762rh46i09iddhxvaim2-ld-wrapper-0.drv",["out"]),("/gnu/store/n0h0fjvxk93jzl8jp9n6p1g52dlj1m6l-gawk-5.0.1.drv",["out"]),("/gnu/store/pqyd0rq2aqx8rbgdgjzpcjizhq6wzhv9-file-5.38.drv",["out"]),("/gnu/store/qlf9sxffyy9h6cw4zm5jnbilzbimgbil-binutils-2.34.drv",["out"]),("/gnu/store/r7i0jcdvnwkm2k1h4wx42w5m9fnsanmq-glibc-utf8-locales-2.31.drv",["out"]),("/gnu/store/vchlnxh5gsi6m12jk5x66dxswxx32h61-sed-4.8.drv",["out"]),("/gnu/store/y3mdvds4kj82mk76f4rfqlv9n5m19n44-r-minimal-4.0.3.drv",["out"]),("/gnu/store/y9d0im1z8f4bvv7a74s0yycl3d0z4yh0-linux-libre-headers-5.4.20.drv",["out"]),("/gnu/store/zp2jf7vmqm0q386d0snlmhfdsgykdv2a-tar-1.32.drv",["out"])],["/gnu/store/2arrpvah49pfchmlfnppaynwhjx4gw3x-module-import","/gnu/store/rv0awncdchqfd8j32dqjr77s0x44f24r-r-yaml-2.2.1-guile-builder"],"x86_64-linux","/gnu/store/0m0vd873jp61lcm4xa3ljdgx381qa782-guile-3.0.2/bin/guile",["--no-auto-compile","-L","/gnu/store/2arrpvah49pfchmlfnppaynwhjx4gw3x-module-import","/gnu/store/rv0awncdchqfd8j32dqjr77s0x44f24r-r-yaml-2.2.1-guile-builder"],[("GUILE_LOAD_COMPILED_PATH","/gnu/store/k2xsq0ab5yvjhs8km8d74ayardb2n22h-module-import-compiled"),("out","/gnu/store/2am1s5hqgkzxzbyvcfbhxq72diny117q-r-yaml-2.2.1")])
-
- Q: by “reproduce” do you mean building the bit-by-bit identical artifacts?
- A: Yes, Guix aims at bit-by-bit reproducibility. Since we are controlling also the build dependencies (e.g. the exact compiler version) and building parameters, for a given architecture (e.g. x86_64), it ought to be able to build the exact same artifacts at a different time, possibly on a different machine with the same architecture.
- A: Reproducibility helps with testing, because once a package has been tested in an environment, we have high confidence that it will behave the same in the same environment, even if it was built at a later time.
- A: But this strict reproducibility depends on having a deterministic build process using the derivation. So the build cannot for example involve randomness, or write current timestamp in any of the build artifact. So for some packages the build system may need to be adjusted for remove these. Guix provides hooks to specify in the package definition any adjustments of the different building phases of any build system.
- Q: From the above example of package definition, the dependency
specifies only the package name but not the exact version?
- A: Yes, when the dependency only lists the name of the package, the exact version is implicit, i.e. it is whatever the version that is built together with the target package.
- A: Most package managers have the ability to specify version ranges for dependencies, and a constraint solver is needed to determine whether a certain set of packages have conflicts. But to my knowledge this is not available in Guix. In Guix, it is assumed that when a package definition is developed, it is tested against a particular version of package definitions as dependencies. If none of the package definition of direct or indirect dependency is changed, the package can be exactly reproduced. If any of the dependency was later changed, depending on how many packages it may affect, different levels of testing would be performed to minimize the adverse effect of breaking other packages.
Channels for package definitions
- Q: The set of package definitions is critical, how to manage them in a sane way?
- A: The package definitions can be organized as a set of files, each containing a set of related packages (e.g. one file for R CRAN packages, one file for Python PyPi packages, etc). In Guix these files are in fact code, so it is a good idea to manage them with source control system such as git.
- A: In Guix, a repository of package definitions is maintained as a git repository, the official one is https://git.savannah.gnu.org/git/guix.git
- A: Another benefit of maintaining package definitions in git repository is that a git commit represents a snapshot of all the package definitions at a time point, which allows easy pinning of package versions of a set of packages.
- Q: So the set of packages form a graph with the dependency links,
is there a way to query this graph programmatically?
- A: Yes, the dependencies among the packages form a graph (which
should be a direct acyclic graph, otherwise we would have
cyclic dependency) in form of Guile (a dialect of Scheme) data
structure, and Guix provides programmatical access to this
graph for various kinds of manipulations, e.g.:
- query the direct and indirect dependencies of a set of packages
- query the set of packages that depend on a package, e.g. to see which package may be affected if a package is updated
- plot the dependency graph of a set of packages
- A: Yes, the dependencies among the packages form a graph (which
should be a direct acyclic graph, otherwise we would have
cyclic dependency) in form of Guile (a dialect of Scheme) data
structure, and Guix provides programmatical access to this
graph for various kinds of manipulations, e.g.:
- Q: Can I maintain my private list of packages?
-
A: Guix allows using multiple channels at the same time, and creating a channel is basically as simple as creating a git repository. So you can easily create your own channel(s) as a (public or private) git repository, for whatever package definitions that you want to maintain, as long as the computer where Guix is installed can access the repository. Moreover, you can also add third-party channels (similar to PPA in Debian based distribution) for extra packages.
-
A: The channels of Guix is recorded as a text file, including the current commit of each channel. This file facilitates version controlling of the states of the channels.
-
A: An example of a file with two channels is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
(list (channel (name 'nonguix) (url "https://gitlab.com/nonguix/nonguix") (commit "51dc6fb07ea1984f2ce55a44b0ce998200fb0e5c") (introduction (make-channel-introduction "897c1a470da759236cc11798f4e0a5f7d4d59fbc" (openpgp-fingerprint "2A39 3FFF 68F4 EF7A 3D29 12AF 6F51 20A0 22FB B2D5")))) (channel (name 'guix) (url "https://git.sjtu.edu.cn/sjtug/guix.git") (commit "0efd68681dcec50d445a4fd080c315b999164828") (introduction (make-channel-introduction "9edb3f66fd807b096b48283debdcddccfea34bad" (openpgp-fingerprint "BBB0 2DDF 2CEA F6A8 0D1D E643 A2A0 6DF2 A33A 54FA")))))
-
Substitution servers
- Q: But I do not want to build the package from source all the
time, can I download pre-built binary instead?
- A: Of course, it is possible to download pre-built packages
called substitution from server. And this is where the
package hash comes into handy, because it serves as a key to
identify the package. When a package is wanted either directly
or indirectly, roughly Guix does the following:
- check whether the local
/gnu/store
already has that exact package. If so, then no further action is needed for the package. - if the package is not in
/gnu/store
yet, then check whether there is a pre-built substitute from the official substitution server (or additionally configured server(s)). If so, then download the pre-built package. - if the package is not in any of the substitution servers, then build the package locally.
- check whether the local
- A: You may also create your own substitution server, which
roughly amounts to having a machine with Guix, then running
guix publish
, check Invoking guix publish for details.
- A: Of course, it is possible to download pre-built packages
called substitution from server. And this is where the
package hash comes into handy, because it serves as a key to
identify the package. When a package is wanted either directly
or indirectly, roughly Guix does the following:
- Q: Can I easily share built packages within my network, so that
worker nodes need not build the packages themselves?
- A: Yes, you can have one or more machines in your network that
runs
guix publish
, and add it (or them) as a substitution server(s) in the Guix of the work nodes.
- A: Yes, you can have one or more machines in your network that
runs
Package upgrading and grafting
- Q: Since in Guix each exact version of a package has an
associated hash, maybe we can use this to avoid conflicting
dependency versions?
- A: Of course. The main problem with dynamic dependency is that they are specified with only package name and major version, and resolved at run-time. If another package using the same dependency requires a newer version of it, then it will be updated, and other packages which depend on it will now resolve to the new version, which may cause breakage, even though the newer version is supposed to be backward compatible.
- A: The stable and conservative way of managing dependency is to
ensure the dependencies always resolve to the same exact
version as at built time, so that we need not worry that it suddenly
breaks due to any other (un)related updates. In Guix, the
derivation has already pinned the exact versions of the
dependencies using the hash, and in building the package, Guix
build system tries to hard-code the paths of the dependencies (to
/gnu/store
) as much as possible, so it is similar to “static linking”. - A: Note that if some direct or indirect dependency is updated, and we use the same package definition to build the package, the resulting hash will be different (unless we have a hash collision which is exceedingly rare).
- Q: This “static linking” does not sound good, are we giving up
the benefits of dynamically linked libraries, namely the same
code need only has one copy in memory, to be shared between many
different programs, which is useful for GUI programs which depend
on the widget library?
- A: Whether a library is dynamically or statically linked depends on the options in linking. The Guix way is “static”, but still uses dynamic linking if the library is dynamically linked. Therefore, if exactly the same dynamic library is a dependency of several different programs, at run-time, the same dynamic library is still only loaded into memory once and shared.
- A: On the other hand, if different programs need to use different minor versions of the same dynamic library, then Guix’s way just works with no other handling needed.
- Q: How about easy updating of dependency, e.g. to fix security vulnerability?
- A: In true static linking, if a library needs to be updated (e.g. due to security vulnerability), all the packages that directly or indirectly depend on the library need to be rebuilt. Guix is similar, with the difference that Guix carefully and accurately tracks the dependencies, so re-building is less of a hassle, and only takes some time. Also note that, the rebuilt packages will have different hashes, and the updated dependencies are again carefully and accurately tracked.
- A: But it is true that having to rebuild packages could be inconvenient, especially for lower level libraries that is dependency of many other packages (e.g. glibc). That’s why Guix allows grafting, which basically allows replacing some dependencies without rebuilding the whole package, i.e. reusing most of other components, if applicable. Also see Security Updates for more descriptions. This may save substantial time in rebuilding packages, and is pretty much the same as replacing a dynamic library, except that the dependencies are still accurately tracked. Also note that grafted packages have different hashes from rebuilt package.
Per-user profiles
- Q: So all my built packages are in
/gnu/store
with a long path with package hash, and they (mostly) will statically link to each other, but the long paths seems very inconvenient in using, E.g. how do I executeemacs
?-
A: Of course it is extremely inconvenient to type
/gnu/store/ccg56ki80zshgkpbbaabh9dd6frmfxc3-emacs-27.2/bin/emacs
to invoke emacs, so Guix uses a bunch of symbolic links and suitably setting thePATH
environment so that you can still conveniently invoke emacs just by typingemacs
in your shell. -
A: In a typical Linux, the program binaries are installed in locations such as
/bin
,/usr/bin
, etc, and these paths are added to thePATH
environment variable, so that we can simply type the program name to invoke a program. But putting all binaries at a global location causes trouble when we want different versions of the same program to be installed (and they have the same name, think different versions of R, the executable are all calledR
) and choose which to use at different times. -
A: In order to avoid this problem, Guix uses the idea of profile to hold a set of packages, which is essentially a directory containing sub-directories such as
bin
to hold symbolic links to binaries,etc
,include
,lib
, etc which holds (symbolic links) to things for the set of packages. E.g. currently on my system the default profile/home/peter/.guix-profile
points to/gnu/store/iw0r9yprbhsy5vlqp1dkg7maajnf3hkb-profile
(found byreadlink -f /home/peter/.guix-profile
). And we can have a peek of what is inside:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
$ ls -l /gnu/store/iw0r9yprbhsy5vlqp1dkg7maajnf3hkb-profile total 164 dr-xr-xr-x 2 root root 36864 Jan 1 1970 bin dr-xr-xr-x 4 root root 4096 Jan 1 1970 etc dr-xr-xr-x 2 root root 4096 Jan 1 1970 include dr-xr-xr-x 7 root root 12288 Jan 1 1970 lib dr-xr-xr-x 2 root root 4096 Jan 1 1970 libexec -r--r--r-- 2 root root 89125 Jan 1 1970 manifest dr-xr-xr-x 2 root root 4096 Jan 1 1970 sbin dr-xr-xr-x 19 root root 4096 Jan 1 1970 share lrwxrwxrwx 7 root root 61 Jan 1 1970 var -> /gnu/store/xnrw9pmw6zjc2x7f7w9bzq0sqjx9cbrl-openssh-8.5p1/var $ tree /gnu/store/iw0r9yprbhsy5vlqp1dkg7maajnf3hkb-profile | head -n 20 /gnu/store/iw0r9yprbhsy5vlqp1dkg7maajnf3hkb-profile ├── bin │ ├── a2ping -> /gnu/store/rnk9lj36z59ikmj4izs3r0knn0klkig2-texlive-20190410/bin/a2ping │ ├── a5toa4 -> /gnu/store/rnk9lj36z59ikmj4izs3r0knn0klkig2-texlive-20190410/bin/a5toa4 │ ├── adhocfilelist -> /gnu/store/rnk9lj36z59ikmj4izs3r0knn0klkig2-texlive-20190410/bin/adhocfilelist │ ├── afm2afm -> /gnu/store/rnk9lj36z59ikmj4izs3r0knn0klkig2-texlive-20190410/bin/afm2afm │ ├── afm2pl -> /gnu/store/rnk9lj36z59ikmj4izs3r0knn0klkig2-texlive-20190410/bin/afm2pl │ ├── afm2tfm -> /gnu/store/rnk9lj36z59ikmj4izs3r0knn0klkig2-texlive-20190410/bin/afm2tfm │ ├── ag -> /gnu/store/bk09ij4jxmpvxij0q3k2022ivrj5mfag-the-silver-searcher-2.2.0/bin/ag │ ├── aleph -> /gnu/store/rnk9lj36z59ikmj4izs3r0knn0klkig2-texlive-20190410/bin/aleph │ ├── allcm -> /gnu/store/rnk9lj36z59ikmj4izs3r0knn0klkig2-texlive-20190410/bin/allcm │ ├── allec -> /gnu/store/rnk9lj36z59ikmj4izs3r0knn0klkig2-texlive-20190410/bin/allec │ ├── allneeded -> /gnu/store/rnk9lj36z59ikmj4izs3r0knn0klkig2-texlive-20190410/bin/allneeded │ ├── arara -> /gnu/store/rnk9lj36z59ikmj4izs3r0knn0klkig2-texlive-20190410/bin/arara │ ├── arlatex -> /gnu/store/rnk9lj36z59ikmj4izs3r0knn0klkig2-texlive-20190410/bin/arlatex │ ├── authorindex -> /gnu/store/rnk9lj36z59ikmj4izs3r0knn0klkig2-texlive-20190410/bin/authorindex │ ├── autoinst -> /gnu/store/rnk9lj36z59ikmj4izs3r0knn0klkig2-texlive-20190410/bin/autoinst │ ├── autopep8 -> /gnu/store/isb01kkmcx4x9b4b4hc86z3ayk8659za-python-autopep8-1.5.3/bin/autopep8 │ ├── autosp -> /gnu/store/rnk9lj36z59ikmj4izs3r0knn0klkig2-texlive-20190410/bin/autosp │ ├── axohelp -> /gnu/store/rnk9lj36z59ikmj4izs3r0knn0klkig2-texlive-20190410/bin/axohelp
We see that the programs inside
bin
of the profile are really symbolic links to the real binary in the packages in/gnu/store
. And therefore if myPATH
contains/home/peter/.guix-profile/bin
, I can use the programs in the profile just as those installed globally. Note that the real profile directory is also in/gnu/store
with a hash, so if you install a different set of packages, the real path will be different.
-
- Q: But then what advantage does a profile provide? Oh, “most
problems in computer science can be solved with one more level of
indirection”, since profiles are behind at least one level of
symbolic links, that means we can have multiple profiles for
different sets (and versions) of packages, to be used either
independently or together, by setting
PATH
and related environment variables suitably, right? I guess that’s how Guix provides the generation and roll-back features?- A: Exactly. Also, when we make any install/upgrade/remove
actions on a profile, essentially we are asking for a different
set of packages, so it will have a different real directory
under
/gnu/store
with its own hash, and we can make a new symbolic link for it (whether the set of packages have already been created or not), and give it a new generation number, while keeping the symbolic link for the previous generation. This way, we can easily roll-back to the previous set of packages by just changing a symbolic link. This removes a lot of trouble of accidentally upgrading packages and breaking your workflow. - A: Moreover, since a symbolic link is cheap, we can easily
create as many as we like, and each profile can have its own
generation numbers, and be changed separately. There is great
flexibility in how the profiles can be used. Some profiles can
be used together if we put their
bin
directories inPATH
; or some profiles can be meant to be used separately, activated only when needed, much like virtualenv in Python. The profile organization is entirely up to the user. For example, currently on my system I have the default profile for my frequently used tools such asemacs
and manyemacs
packages, a separate profile for data science related packages such as R and R packages.
- A: Exactly. Also, when we make any install/upgrade/remove
actions on a profile, essentially we are asking for a different
set of packages, so it will have a different real directory
under
- Q: Given that the
/gnu/store
is a global location for all the packages, do I need root privilege to install package in Guix?- A: No, each user can create as many profiles as he or she
likes, because the default profile is put under the user’s home
directory, and extra profiles can be placed basically wherever
the user likes. Although
/gnu/store
is the global cache of all the packages and other stuffs, it is meant to be immutable from the perspective of normal user (i.e. normal user cannot modify/gnu/store
directly), so it can be shared with different users and profiles. And writing to the/gnu/store
is managed by the Guix daemon, so that its consistency can be maintained. - A: When a user do any actions that need to update
/gnu/store
(e.g. downloading a pre-built package, or creating a profile with a different set of packages), theguix
command will communicate with the Guix daemon as needed. Therefore, even on a shared system such as a server, Guix allows each user to install his or her own sets of packages, organized in however many profiles desired, and those packages that are exactly the same can still be safely shared.
- A: No, each user can create as many profiles as he or she
likes, because the default profile is put under the user’s home
directory, and extra profiles can be placed basically wherever
the user likes. Although
- Q: Can I have two different versions of a package installed at the same time?
-
A: The short answer is yes, the long answer is more complicated. It is certainly possible to have two different versions of a package in the system, because they will have different hashes, and therefore different paths in
/gnu/store
. Also, for most programs, their dependencies are essentially hard-coded (in a static way), so they mostly can co-exist without problems, but installing them to the same profile may still cause inconvenience. -
A: For example, even if you managed to install both R 3.6.3 and R 4.0.2 in the same profile, since there is only one
PATH
environment variable, when you typeR
, you will only be invoking one of them (whichever one that appears earlier inPATH
). -
A: For packages such as R packages, which are more dynamic in nature, the story is more complicated, where packages in the same profile need to be somewhat compatible with each other. To my understanding, the R packages in a profile are placed in a
site-library
directory in the profile, and inside are symbolic links to real directory of each R package. Currently on my system, myds
profile for data science things:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
$ which R /home/peter/guix_extra_profiles/ds/bin/R $ ls /home/peter/guix_extra_profiles/ds bin etc include lib libexec manifest sbin share site-library $ ls /home/peter/guix_extra_profiles/ds/site-library/ abind cachem commonmark devtools forcats ggsci gtools jquerylib lintr miniUI pillar ps readxl RPostgreSQL SHAPforxgboost survMisc utf8 xtable askpass callr conquer dials foreach ggsignif hardhat jsonlite listenv minqa pkgbuild purrr recipes rprojroot shiny svglite vctrs yaml assertthat car corrplot DiceDesign foreign ggtext haven KernSmooth lme4 modeldata pkgconfig quantreg rematch rsample shinyjs sys viridisLite yardstick backports carData covr diffobj formatR ggthemes highr km.ci lubridate modelr pkgload R6 rematch2 RSelenium slider systemfonts waldo zip base64enc caTools cowplot digest fs gh hms KMsurv magrittr munsell plyr ranger remotes rstatix sourcetools testthat warp zoo BBmisc cellranger cpp11 doMC furrr gitcreds htmltools knitr maptools mvtnorm png rappdirs repr rstudioapi sp tibble wdman BH checkmate crayon dplyr future glmnet htmlwidgets labeling markdown nlme polyclip rbenchmark reprex R.utils SparseM tidymodels whisker binman class credentials DT gdata globals httpuv languageserver MASS nloptr polynom R.cache rex rversions spatial tidyr withr bitops cli crosstalk e1071 generics glue httr later Matrix nnet praise rcmdcheck rio rvest SQUAREM tidyselect workflows blob clipr curl ellipsis gert gmodels infer lattice MatrixModels numDeriv prettyunits RColorBrewer rlang sass statmod tidyverse xfun boot cluster cyclocomp evaluate ggExtra gower ini lava matrixStats openssl pROC Rcpp rmarkdown scales stringi timeDate xgboost brew codetools data.table exactRankTests ggforce GPfit ipred lazyeval maxstat openxlsx processx RcppArmadillo R.methodsS3 selectr stringr tinytex XML brio collections DBI fansi ggplot2 gridExtra isoband lhs memoise parallelly prodlim RcppEigen R.oo semver styler tune xml2 broom colorspace dbplyr farver ggpubr gridtext iterators lifecycle mgcv parsnip progress RCurl roxygen2 sessioninfo survival tweenr xmlparsedata bslib colourpicker desc fastmap ggrepel gtable jpeg lightgbm mime pbkrtest promises readr rpart shape survminer usethis xopen $ tree /home/peter/guix_extra_profiles/ds/site-library/ | head /home/peter/guix_extra_profiles/ds/site-library/ ├── abind -> /gnu/store/54kzkqlfds1da34g66hy881b51q844ly-r-abind-1.4-5/site-library/abind ├── askpass -> /gnu/store/hznmrksfikc75lvs6plywp09vwzhcjbj-r-askpass-1.1/site-library/askpass ├── assertthat -> /gnu/store/bcy712pqmjs86xwchjkq8af701zs76n3-r-assertthat-0.2.1/site-library/assertthat ├── backports -> /gnu/store/1ld20n5abycl0x9ma67zk17mgm390hji-r-backports-1.2.1/site-library/backports ├── base64enc -> /gnu/store/3pkyzwliy76mrcc56rxp2yh4w0g9130f-r-base64enc-0.1-3/site-library/base64enc ├── BBmisc -> /gnu/store/7lm3ivlg7iwz0g9m1h3yl2h5jmn0cp9n-r-bbmisc-1.11/site-library/BBmisc ├── BH -> /gnu/store/d9mcnkkn1crvnjz337h3cznfdnvm2y4c-r-bh-1.75.0-0/site-library/BH ├── binman -> /gnu/store/mwzvczl535j0cqr4yjw48yvs71b5m290-r-binman-0.1.2/site-library/binman ├── bitops -> /gnu/store/mdj7ad7pdm4ljqmxl41gdvi7bk3djwx1-r-bitops-1.0-6/site-library/bitops
And I have experienced Guix errors about R package version conflicts if I installed some very old and very new R packages in the same profile, that’s why I now put data science packages in a separate profile.
-
A: To avoid problems, we can either create different profiles for different sets of compatible packages, and activate the profile as needed. Alternatively, we can use
guix environment
to spawn a temporary shell where the designated packages are accessible, which is a great way to manage per-project dependencies.
-
- Q: Is it possible to record a set of packages in a file, and then
instantiate the packages as a profile in just one command?
-
A: Sure, this can be done by using a manifest file, which records the list of packages wanted, then you can use
guix package -m your_manifest
to change the default (you can change the profile to act on by using the-p
option) profile. An example of a manifest file which contains R and some R packages is:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
;; for data science stuffs, here mainly R, and some python (specifications->manifest (list ;;"r-languageserver" "r" "r-yaml" "r-xml2" "r-xgboost" "r-tidymodels" "r-tidyverse" "r-survminer" "r-styler" "r-shiny" "r-rvest" "r-rbenchmark" "r-ranger" "r-purrr" "r-lubridate" "r-jsonlite" "r-gridextra" "r-gmodels" "r-glue" "r-glmnet" "r-ggthemes" "r-ggplot2" "r-foreach" "r-formatr" "r-e1071" "r-domc" "r-devtools" "r-data-table" "r-blob" "r-shapforxgboost" "r-rcpp" "r-rselenium" "r-rpostgresql" "r-dt" ;; "python" "python-ipython" ))
-
Package version pinning and guix environment
- Q: I understand that we can use profiles to avoid package version
conflicts, much like we do in virtualenv, can you elaborate more
on
guix environment
?-
A: You can image
guix environment
as a temporary profile (just as every profile has its own path in/gnu/store
), and you are put a shell with the environmentGUIX_ENVIRONMENT
pointing to its path, andPATH
is setup such that thebin
of this temporary profile in at the front ofPATH
. The set of packages can be specified either as arguments in calling the command, or in a manifest file. Once we exit the shell, the temporary profile is still cached in/gnu/store
, just as with other packages, so that the next time you want an environment with the same set of packages, it can be reused (with a little time for Guix to figure out that the profile is already there) instead of rebuilding all the packages. E.g. if I do spawn an environment with R byguix environment --ad-hoc r
on my current system, once in the shell:1 2 3 4 5 6 7
$ guix environment --ad-hoc r [dev]$ echo $GUIX_ENVIRONMENT /gnu/store/zwz604am03q0n3vwyca3hvfd5lpb0k8z-profile [dev]$ echo $PATH /gnu/store/zwz604am03q0n3vwyca3hvfd5lpb0k8z-profile/bin:/home/peter/guix_extra_profiles/other/bin:/home/peter/guix_extra_profiles/other/sbin:/home/peter/guix_extra_profiles/games/bin:/home/peter/guix_extra_profiles/games/sbin:/home/peter/guix_extra_profiles/ds/bin:/home/peter/guix_extra_profiles/ds/sbin:/home/peter/.guix-profile/bin:/home/peter/.guix-profile/sbin:/run/setuid-programs:/home/peter/.config/guix/current/bin:/home/peter/.guix-profile/bin:/home/peter/.guix-profile/sbin:/run/current-system/profile/bin:/run/current-system/profile/sbin [dev]$ which R /gnu/store/zwz604am03q0n3vwyca3hvfd5lpb0k8z-profile/bin/R
Note that in this temporary environment, what packages R can see depends on whether there are other R packages installed in other profiles already in effect.
-
A: Note that it is possible to control how “isolated” this environment is. E.g. by limiting the
PATH
not to include thePATH
of the current shell, the spawn shell is more isolated. Moreover, by using container capability of the Linux kernel, it is possible to also isolate the file system and the network, in addition to the accessible packages.
-
- Q: How to pin versions of packages?
- A: Recall that the package definitions are maintained as channels which are essentially git repositories, and therefore if we record the commit of each of the git repositories, we then have a snapshot of package definitions, which allows us to reproducibly build the exact versions of selected packages.
- A: In Guix, the commits of channels can be conveniently
recorded in a channels file using the
guix describe
command, and you can use the-f
option to choose a convenient format, e.g. usingguix describe -f channels
results in a channel file suitable for theguix time-machine
command. Theguix time-machine
allows Guix actions on packages definitions at certain point in time, which can be specified as commit or more conveniently a channels file. - A: E.g. by combining
guix time-machine
andguix environment
, we can easily reproduce the exact versions of desired set of packages, by using one channel file for commits, and one manifest file for set of packages. These two files can be easily version controlled per-project, and therefore can be conveniently used for per-project dependency management. We will see example in a later post of this series.
Transactional package management
- Q: What about the “transactional” part?
- A: “Transactional” means each set of action(s) either succeeds
as a whole, or fails as a whole, there would not be half
states. E.g. when you try to install a list of packages in
one
guix package
command invocation, if it was interrupted in the middle (e.g. by the user, or there was a power outage), then none of the packages will be “installed”, although some of the completed ones may then be present in/gnu/store
. And you can then repeat the same command, the already completed and cached packages will be quickly finished. Therefore, there is no worry of half-completed states of the set of installed packages. - A: Guix achieves this transactional behavior by using similar
techniques as in database management system to maintain the
integrity of the cache. When each package is being built, lock
files would be created to indicate that it is under
construction, and once a package is completed and then the
content of its path under
/gnu/store
will not longer be modified, the lock file is removed. Therefore Guix knows which paths are complete and valid, and can maintain the integrity of the cache. - A: Another part to the transactional behavior is the
“atomicity” of renaming a (local) file, which is an assumed
property of Linux. Since each set of action(s) will create a
profile, either the temporary profile for
guix environment
, or the another generation of a profile, which is basically a set of symbolic links. Therefore, Guix can create a set of paths ingnu/store
as needed, and rename the symbolic link for the profile to finish the action(s), in an essentially atomic way.
- A: “Transactional” means each set of action(s) either succeeds
as a whole, or fails as a whole, there would not be half
states. E.g. when you try to install a list of packages in
one
Guix system
- Q: What about Guix system? Let me see, since a Linux distribution
is more or less a set of packages and some associated
configuration working together (besides the Linux kernel), so we
can apply the Guix method to manage a whole Linux distribution?
- A: Exactly. For example, we use a manifest file to ease our management of packages, imagine applying similar idea to users, groups, and different services of a Linux distribution, i.e. use a plain text system configuration file to “declaratively” specify these (well, maybe user password are better specified in other ways, because we often want to version control this configuration file). Whenever we want to update the “state”, e.g. add or remove users, or change some information of users, add or remove system services, then instead of mutating the system imperatively, we instead modify the system configuration file, and re-apply it to reconfigure the system. This has the advantage that the system state should be recorded in the system configuration at all time, which allows us to re-apply the same configuration to a different machine to replicate the setup. Furthermore, the changes to the system can be easily recorded by version controlling this system configuration file.
What’s next?
In this second part we had a closer look at what Guix is, next time we will discuss why bother with Guix when there are alternatives that solve similar problems.
Author Peter Lo
LastMod 2021-05-10