Categories: Linux

Intro

On a Linux install, should top-level directories {/bin,/sbin,/lib} hold only “system critical executables/libraries” (traditional approach), or should they simply be symlinks to corresponding subdirectories under /usr where the whole set of executables/libraries are (“unified usr”)?

The Case For The /usr Merge blog posting from 2014 discusses this issue (also known as “usrmove” or “unified usr”). I’ve been working my way through the Linux From Scratch tutorial, which started me thinking about this topic too. I’ve come to the conclusion that the unified approach is a good idea; the old approach works but there are some use-cases where unified is better. Below are my reasons, and some extra information on the related topics of initramfs and the Filesystem Hierarchy Standard.

This article expands on The Case for the Usr Merge and the Fedora UsrMove plus info from various email threads with some additional background info and FWIW my personal opinions.

Who should care about this issue?

Frankly, this affects very few people. For 99.99% of Linux users, it makes no difference whatsoever whether executables are in a “unified usr” or not. Even for 95% of professional sysadmins it makes no difference.

It does affect the following people:

sysadmins who are administering large datacenters
distribution maintainers, ie people who write package-specs for distributions
people who regularly install software on their Linux systems “from source” using upstream-provided makefiles (includes all Linux From Scratch users, but not Gentoo etc AFAIK)
embedded systems developers
sysadmins who recover “broken” computers by booting into rescue-mode and reconfiguring the system
people with very old computers where the root filesystem is on a tiny storage device (yes, this is a trivial percentage, but such people can be found on any email thread on this topic)
and those who are just curious about operating systems and how they work

The Issue

A running system always has one “root filesystem” (aka “rootfs”) which starts at /, has some directories, and possibly some files in those directories. Other filesystems may be mounted on top of directories defined in the rootfs or on directories defined by other mounts.

So where are executable files such as “sh”, “ping”, “vi”, and the libraries they depend on, stored?

The traditional answer is to define directories {/bin,/sbin,/lib} on the root filesystem, and put “critical” files (eg sh) into those directories. The /etc directory is also always on the root filesystem. Non-critical applications (eg “vi”) either go into subdirs of local directory /usr directly (ie the files are really held on the rootfs), or go onto a separate filesystem that gets mounted on /usr.

The problems with the traditional approach are:

It isn’t easy to decide which apps are “critical” and which are not, for example should wireless networking be included or not?
It is significant work for a distribution to ensure the packaging for all “critical” applications puts the files in the correct locations
It is easy to make mistakes such that applications in the “critical” category accidentally rely on files in the “non-critical” category, particularly in rarely-used corner cases and in race-conditions.
The rootfs is a mix of potentially sharable executables and host-specific files (eg /etc/hostname); this mixing makes it impossible to share the executables with other systems via common network storage
Executables which are stored on a centrally-managed (shared) usr filesystem but depend on libraries on a host-local rootfs are difficult to update

The difficulty in deciding what qualifies as “critical” means that different distributions make different decisions, which can be confusing for users. It also means that unusual ways of mounting a separate usr fileysystem may be supported on some systems (which included the required software in the “critical” category) but not on others (which pushed the required software into the usr filesystem). There is nothing here that can’t be worked around, but it isn’t elegant.

The “critical depends on non-critical” point is a theoretical problem only for systems where the usr directory is on the root filesystem, and a real (though fairly rare) problem for systems with a separate usr filesystem.

The “corner cases” referred to above are executables that reside on the rootfs but accidentally depend on libraries, helper apps, or configuration which is on the usr filesystem. Such misconfigurations for commonly-used tools are quickly detected, but problems with less-frequently-used can be missed. Debian had such a problem that existed for years before being fixed.

The “race conditions” referred to above are cases where the init sequence starts some background application from the rootfs, then continues on towards the step of mounting /usr. If that background application happens to depend on data or a helper application that is only on the usr filesystem then the data might or might not be present, depending on how fast the background process works with respect to the rest of the init process. This can result in a system that usually works but not always. Simple cases are where an executable on the rootfs requires a dynamic libarary that is under /usr. There are trickier cases such as udev or system daemons started via init; these can rely on scripts which can in turn depend on external files or executables. One possible kind of breakage is booting a system with a USB device attached, and that device not initially working because at the time udev processed it the necessary config was not available; if unplugging/replugging such a device after boot resolves the issues then looking where the udev-related config for that device is stored would definitely be worth-while!

Blog posting separate usr is broken states that Fedora15 had at least 23 problems related to the above “corner case”/”race condition” problems. Presumably other distributions also have (or had) such problems too; I couldn’t find any such information with a short search. I did try the recommended command on a debian8+systemd install, and found a bunch of errors including:

virtualbox: will fail to mount USB devices present at boot (/etc/udev/rules.d/60-vboxdrv.rules relies on /usr/share/virtualbox/..)
usbmux: AFAICT, an Apple iphone which is connected via USB at boot will not be correctly handled (/lib/udev/rules.d/39-usbmuxd.rules tries to run /usr/bin/pkill ...)
ceph distributed filestore: if the kernel detects an rbd device before usr is mounted, then the ceph service-file tries to execute /usr/bin/ceph-rbdnamer which won’t end well..
alsa sound-card handling that runs for each /dev/controlC* node also looks potentially broken (state saved on previous shutdown potentially not restored)

I would guess that the sysv-init scripts that the same packages install will have similar problems, ie this isn’t a systemd-specific problem. So this issue is indeed real and not limited to Fedora.

None of these are huge points - except for admins of large data-centers where having “centrally managed executables” is fairly important. However all of these are nice-to-haves. So can we do better? In the “unified usr” approach:

All (distribution-managed) executables go on the same filesystem, without bothering to distinguish between critical and non-critical
If sharing of executables with other systems isn’t needed, then the executables can simply go on the rootfs (and an initramfs is optional)
For clustered environments, all executables are on a usr filesystem that gets mounted on top of the rootfs (via a mandatory initramfs)

Of course for backwards compatibility, the executables need to be reachable via the traditional paths such as /bin, etc. Regardless of what is mounted where, a few appropriate symbolic links can take care of that.

There are of course some disadvantages to the unified approach too; the tradeoffs are discussed later.

To Partition or not to Partition

Let’s get one simple question out of the way first: when installing Linux, should a separate usr filesystem be created or not? For most people, the answer is simply no; a separate usr filesystem brings nothing for normal desktop users. It is mostly applicable for:

server clusters in datacenters, where the usr filesystem is on network storage and shared across multiple servers
embedded systems (maybe)

Note that it can be a good idea to separate mostly-stable files from rapidly-changing files. The /var directory is intended to hold changing (“variable”) data; allocating a separate filesystem for mounting on /var can be a good idea. Alternatively, /var can be left on the root filesystem and /usr split off into a different filesystem instead (also achieving a split of stable/variable) but that is more complicated.

The Filesystem Hierarchy Standard

The Filesystem Hierarchy Standard (FHS) is a fairly short document, and well worth reading. It has some relevance to this discussion, so here is a quick summary.

static files are those that do not change without sysadmin involvement. variable are “not static”
sharable are files that can be shared between hosts; nonsharable are host-specific.
- /opt, /usr are static and sharable
- /var contains variable files
- /boot and /etc are obviously host-specific

To summarize, the traditional concept is that:

/bin and /sbin hold “critical” applications (even though they otherwise qualify as static + sharable)
/usr/bin contains apps that are not needed for boot/recovery, ie are sharable
/lib holds libraries needed for boot, while /usr/lib holds all others
/bin holds apps that may be of use to normal users as well as sysadmins, eg “sh”
/sbin holds apps that only a sysadmin would need; /sbin is not on the PATH for normal users

Kernel modules go in /lib/modules because they are not sharable (like /boot).

Strangely, /lib also contains a number of executable apps and configuration files (ie it is not just libraries). The problem is that the FHS forbids any subdirectories at the top-level, and in /bin or /sbin, which leaves only /lib as an option. Examples of unusual (non-library) things found under ‘/lib`:

/lib/udev has “rules files”, executables
/lib/systemd also has config files and many executables
firmware binary blobs

References:

Usage in Current Distributions

The very early releases of Unix, as made by its original inventors Thompson and Ritchie, had executables partitioned into “critical” (stored under /bin and friends) and non-critical (stored under /usr). Some later derivatives such as Solaris got rid of this distinction. Most Linux distributions have followed the traditional approach but several (most prominently Fedora 17 and later) no longer bother with this separation.

The unified approach was discussed extensively for Debian v8 (Jessie). The email thread showed significant support for unified-usr, and from several well-known debian developers. However a number of changes would be required to the initramfs-generating tools and to dpkg. Debian v8 does not have a unified usr structure (yet).

The current LFS book (v7.7) sets up the traditional “split”.

The Traditional Layout

In the traditional approach, the bootloader is responsible for somehow mounting the root filesystem. After that point, the root filesystem has enough software to manage the rest of the boot process alone, including mounting any other filesystems which are required. In particular, if “non-critical” software has been pushed out to a separate usr filesystem, then the root filesystem must be capable of mounting that usr filesystem.

The kernel started by the bootloader must somehow be able to mount the root filesystem (ie somehow read datablocks from the device on which it is stored, and interpret that data as a filesystem). Mounting the rootfs may therefore require kernel device driver modules, network drivers if the root filesystem is remote, decryption libraries if the root filesystem is encrypted, etc. For reasonably simple setups, the necessary support can just be compiled into the kernel. For more complex setups, or cases where a “stock kernel” is preferred, an initramfs is used which provides a temporary minimal userspace that can mount the root filesystem. See later for further information on initramfs.

Usually the root filesystem also has sufficient sysadmin-specific commandline tools so that if the step of mounting the separate usr filesystem fails then basic diagnostics and recovery can be performed via an attached terminal (or maybe via SSH), and maybe backups can be restored from external storage.

After reading various discussion threads on this topic (eg for debian8 and on lwn), it seems that the primary reasons given for retaining the split approach are:

people want to wring every little bit of performance out of their system, want a separate usr filesystem, and consider even a minimal initramfs too big a price to pay
people are concerned about having to use an initramfs when they have a separate usr filesystem, and in particular keeping it up-to-date
people are comfortable with how things are currently done and the benefits of the unified-usr approach don’t apply to their use-cases
people are comfortable with performing “recovery” via the limited tools on a rootfs and aren’t interested in other solutions

The “performance sensitive” group sometimes refer to the setup in which an embedded system has limited internal storage (yet big enough for a traditional rootfs), with the usr filesystem on an SD-card. I personally would consider this an excellent candidate for a minimal initramfs, a rootfs that just holds /etc and /var plus a unified usr filesystem on the SD-card. An initramfs has a minor boot-time performance hit, but it really is very small. It also requires a small amount of memory during boot - but less than the device will need during normal operation anyway, and that memory is released as soon as the root filesystem is mounted.

Creating and configuring an initramfs is pretty simple; most distributions provide a generic one and also provide suitable tools to create custom ones when needed. Hooking one up to the boot process is a single line in the grub config file - and distro tools normally add that automatically anyway. In general, an initramfs does not have to be kept “up-to-date” unless the usr filesystem is mounted via some very unusual process - but distro tools automate this anyway. Note that an initramfs is not required unless a separate usr filesystem is also being used.

It is true that many people are not affected by the problems of the traditional approach (listed early in this article); the burden is mostly on distribution maintainers to get the setup right in their package definitions. The greatest benefits of the unified approach are seen by distribution maintainers, sysadmins in datacenter environments - but also experienced occasionally by normal end-users who happen to run into mistakes made by distro maintainers.

Although a unified approach doesn’t allow recovery tools to be present on the rootfs, there are several other alternatives for performing recovery on a broken system:

for desktop-like systems, plug in a storage device containing a “live distro” and boot from that
include the necessary recovery tools on the system’s normal initramfs
include an extensive “recovery initramfs” image on the boot partition, but don’t normally use it during boot. On boot failure, reboot the system with that initramfs active.
include a grml-rescueboot image (a simple “live distro”) on the boot partition, and on system problems boot that image

It is important to remember that “I’m comfortable with the current setup” is a valid argument. When an experienced sysadmin knows how to use a system, then changing it has a negative impact for them even when the new system is theoretically better. Even for newbies an old approach can be better if they have access to more help/support/documentation for that environment.

The Unified Layout

Having all executable files stored under /usr (which might be a separate filesystem or might be a plain directory on the rootfs):

is simpler for distribution maintainers (no more deciding what goes where, and tweaking the build processes to install files appropriately)
works more reliably when installing from source-code (which almost always installs into /usr or /usr/local)
is slightly easier to understand for users and less-experienced sysadmins
has no “corner cases” in which an incorrectly configured rootfs will fail intermittently or in unusual circumstances (see “The Issue” above)
permits sharing of the filesystem holding all executables between systems (via network-mounted storage)
doesn’t require a package-manager to scatter stuff across separate rootfs and usr filesystems (yes, this already occurs with /etc but the fewer filesystems the better)
means apps on a separate usr filesystem never link against dynamic libraries on the rootfs (avoids potential upgrade issues when only the usr filesystem is accessable to the package-manager)
makes it possible to “snapshot” the usr filesystem to capture the entire set of executables, eg to allow rollback on upgrade failure (with the exception of /etc)
is mostly compatible with packaging designed for Solaris (eg some open-source software)
is mostly compatible with packaging designed for other unixy systems (with /bin->/usr/bin, installing into /bin still works fine)
makes it simple to mount the filesystem holding all executables as read-only

The most important thing to note is that a “unified usr” cleanly separates sharable files from host-specific (/etc, /var, maybe /boot). Avoiding this mixing makes some interesting use-cases possible and simplifies other problems.

As noted, the “snapshotting” feature above does not cover changes to /etc and similar directories. This means it is not bulletproof, but still useful. The cases of installing, updating and removing a package should be considered separately. It is rare that installing a package overwrites or deletes a file from /etc; therefore although rollback might leave some garbage in /etc (new files or new lines in existing files) it is not likely to break anything. Updating is similar. Removing a package usually does not remove config files, so restoring a snapshot will “undo the remove”. Using this to undo a “purge removal” will probably not be so successful. In the end, this should be considered a “free bonus” rather than a real feature.

As already mentioned, having a unified /usr and having it on a separate filesystem means an initramfs is mandatory. The traditional setup could have a separate /usr filesystem without an initramfs. However having an initramfs is no big deal; it is a minor amount of work and brings many advantages. See later for a quick overview of initramfs.

Thoughts on Unifying into the Root Filesystem

Some have suggested that rather than having executables in /usr/bin and /bin -> /usr/bin the reverse could be done: all executables in /bin and /usr/bin -> /bin. This would also work, but does not cleanly separate host-specific from sharable. The sharable are (bin,sbin,lib) and the host-specific include (etc,var). The unified-usr approach ensures all sharable stuff is under one top-level directory.

A possible solution is to have /etc and /var on separate partitions, and to have an initramfs mount each of those on top of a read-only root filesystem. AFAICT, that approach would have very similar properties to the everything-in-usr design. On the positive side, it does result in a slightly cleaner path to the executables: one directory-lookup fewer which is nice. However IMO it seems natural for the rootfs to belong to the local system: the local system is “keeping control” of itself via its /boot partition (containing its kernel and initramfs) and local rootfs. When the local system chooses to mount an external filesystem and happens to choose a mount-point of /usr, that is its choice. Mounting a rootfs from elsewhere then reclaiming control by mounting a local /etc seems odd. This is perhaps analogous to invoking a library function vs using a framework and registering callbacks; the framework approach reminds me of the midlayer design antipattern. The mount-etc approach also make it impossible to add new mount-points at the root level though that isn’t very important.

Data Center Environments

Unless you are a distribution maintainer, the benefits of a “unified usr” are most visible in large data-centres. In this case, the usr filesystem is stored on a SAN and shared across many machines, while the rootfs is on a device attached locally to each host (or maybe is a small per-host filesystem also on a SAN). During boot, the initramfs on each host mounts the host-specific rootfs and then mounts the “usr filesystem” from the SAN read-only on /usr. To do updates to the usr filesystem for all servers concurrently, the system administrator can do something roughly like:

make a copy of the usr filesystem (eg via snapshot)
use a package-manager (apt, dnf, or other) to install or update the software on this filesystem. Note that:
- the admin needs the corresponding “package database” files locally
- the admin needs a suitable “/etc” directory which may be updated during package changes
- the chroot command may be used to make the package-manager work in this scenario
manually inspect the changes made to /etc, and if necessary push appropriate changes to all hosts via a tool like puppet/chef/etc. Most packages don’t need host-specific files in /etc.
make the updated usr filesystem “live” on the SAN

(Warning: I’m a developer not a sysadmin; this is just my understanding of how such a process could work)

Working with Containers

On a host that runs multiple identical containers, it can be useful for the container images to share a read-only view of the executables.

With the “unified usr” approach, each container image gets a master usr filesystem bind-mounted (read-only of course) onto its private /usr, and also requires the relevant symlinks (/bin->/usr/bin etc). When the container image wants to use a “traditional layout” then four bind-mounts need to be established instead. This is, however, no big deal - ie AFAICT, in the container case the traditional system works fine. This is different from the clustering approach, as:

having multiple bind-mounts is still efficient while having multiple network-mounted filesystems is not;
in the container case /usr is expected to be mounted by the container’s host (as an initramfs would), while on real hardware the rootfs must mount it

Cohabitation

As Helmut Grohne pointed out, it seems technically possible for distros to allow users to choose which approach they want. The difference in packaging is very minor - those few packages that install “critical” executables will still work fine when symlinks are in place. The only case that needs updating is where such packages then create symlinks from /usr/* to the critical executables outside of /usr; they should first check whether /bin is a symlink or similar.

One impact of supporting both approaches is in testing; supporting split-directories means that somebody has to verify that no bad cross-dependencies have been introduced. Fully committing to the unified approach means such testing is unnecessary.

Installing from Source

Most sysadmins simply use a Linux distribution in which experts pre-package software, and the admin just has to use a package-manager to install and update. In this case, the new/updated executables can be expected to be placed in the correct locations.

However sometimes it is necessary to install or update directly from the upstream sourcecode - and some distributions (eg LFS) are primarily based this way. In this case, the sysadmin needs to be sure that the “make install” step places executables in the correct locations. With the “unified usr” approach, this generally happens automatically - just use “–prefix=/usr” and the executables will be placed in the appropriate location under /usr. When using a “traditional” filesystem layout, however:

there may be a symlink under /usr which points to the corresponding file on the rootfs; installing carelessly could overwrite this symlink
it may be necessary to manually move the installed executable out of /usr into the rootfs and then create/fix symlinks.

A system using the unified layout therefore makes installing from source a little bit easier and less error-prone.

Mounting Executables Read-Only

Some people include in their list of unified-usr advantages the ability to mount the usr filesystem read-only and thus improve security.

When executable files are writable then a security hole can lead to a persistent security problem that remains even after a reboot. Truly unwritable executables is definitely a security boost.

The traditional layout does support mounting the rootfs read-only (currently with some limitations). This provides some improvement in security, and in particular will prevent “oops” moments when logged in as root. However when the storage device is local to the host (ie not a network-mount) then there are various ways for an attacker who gains root provileges to mess with the content of that device anyway. A truly unwritable filesystem can be achieved with special hardware (rare), or by using network storage; the remote server then decides when to allow writing and an attack on the client can’t work around that.

Placing the rootfs on a network server certainly can be done with the traditional layout. It is probably best done with an initramfs, but that’s not 100% necessary. More problematic is the fact that in the traditional layout the rootfs has a mix of host-specific files and sharable executables; the filesystem can therefore not be shared with other hosts even when the vast majority of the content (bin,sbin,lib) is identical across hosts. This point is irrelevant in some use-cases but rather important in others.

The unified approach can achieve the same goals with less effort: the rootfs can be a local storage device and just the usr filesystem be mounted remotely. An attacker with root privileges on the host can at worst mess with the contents of the etc directory (which can itself be somewhat ugly, eg adding scripts to the init-system configuration dirs) but all executables are safely protected remotely. Of course the contents of the rootfs could also be protected by network-hosting that too, in which case the difference in effort between traditional and unified layouts is minimal.

In summary: hosting executables on a network server that serves them read-only is a very nice security boost for “centrally managed” environments (rather than using puppet/chef/etc to manage each hosts files). It is slightly easier to achieve with a unified usr layout.

Note: AFAIK, some advanced network storage devices can auto-detect identical files and share them in a copy-in-write manner. This may mitigate some of the problems of network-mounting traditional-format root filesystems (containing a few host-specific files) in datacenter environments (ie with lots of near-identical servers).

Other Notes

It has been pointed out that the distinction between /bin and /sbin is also debatable; many apps traditionally in /sbin are actually useful for “power” non-root users. However that’s another topic.

Conclusion

I like the unified-usr approach. For my use-case (a plain desktop using LFS) it offers a simpler directory structure, easier install-from-source, and slightly easier setup of containers (which I use occasionally for development and experiments). The rest of the changes are irrelevant to me, as I don’t have a separate usr filesystem, and don’t network-mount anything. That’s three minor positives and no negatives. Hardly a world-changing impact but still a positive. And I can brag that my system is “cutting edge” :-)

It’s quite interesting that it took quite a lot of research (this article is many pages long) to come to the conclusion that isn’t more, well, interesting. Nevertheless, I’ve learned quite a few things along the way - hope you have too!

Appendix: the var directory

The /etc directory has been mentioned as a host-specific directory that is usually on the rootfs and thus prevents the rootfs from being shared. The /var directory has similar issues: it also has host-specific content. However there are some differences:

var is modified very frequently, thus raising the chance of filesystem corruption;
var can quite easily get filled to 100% by a misfunctioning application;
var is a non-trivial size (unlike /etc which just holds a few megabytes)
losing the contents of the var filesystem is often not a major problem; this depends on what is installed, and where it stores its files.

This behaviour means that having a separate var filesystem is often a good idea (rather than making var just a directory on the root filesystem).

When a separate var filesystem is used, then it is irrelevant whether the “merged usr” or traditional approach is used. However when /var is on the rootfs then the merged approach allows all executables to be on a separate read-only filesystem. These are exactly the same arguments as for /etc.

Appendix: initramfs/initrd

An initramfs is an archive-file containing kernel modules, userspace executables and configuration-files that the kernel can invoke before the root and /usr filesystems are available. Normally, the initramfs holds just enough to mount the root and /usr filesystems (ie kernel modules for the relevant filesystems, possibly network drivers for mounting remote filesystems, decryption or LVM modules, etc). Some tools (eg Linux installers and tiny distributions such as Puppy Linux) run entirely from an initramfs, and this is also a valid option for small embedded systems.

An initramfs is simply a cpio-format archive (like tar/zip), compressed (with gzip or bzip2) and stored on the boot-partition next to the kernel image - ie when a bootloader can read the kernel image then it can also read the corresponding initramfs image. When the bootloader is configured with such a file, it reads the entire file into memory and simply passes the address/size of this memory to the kernel as a boot-parameter. During the boot process, the kernel:

allocates some memory
formats the memory for use with the built-in trivial filesystem tmpfs
mounts this memory as a tmpfs filesystem at ‘/’
unpacks the initramfs archive into the tmpfs filesystem (using a builtin cpio unpack function)
frees the original block of memory holding the archive file

The kernel then simply executes userspace application /init from the tmpfs at the end of its boot process. This userspace code then can do whatever it wants - but usually:

loads relevant kernel modules
mounts devtmpfs and waits for the device on which the rootfs is stored to appear
mounts the rootfs somewhere (eg /mnt)
invokes systemcall pivot_root to make the filesystem at ‘/mnt’ become ‘/’
unmounts the original tmpfs filesystem
execs file /sbin/init (ie the real init executable from the real rootfs)

With this approach, the kernel requires only tmpfs (which is always built-in) and a (trivial) cpio-unpack function to mount the contents of the initramfs as a temporary rootfs. The initramfs then contains the necessary drivers to mount the real rootfs, and a simple “init program” (often a shell-script which relies on a shell-interpreter also on the initramfs).

Optionally, the initramfs image can be appended to the linux kernel image file rather than having it as a separate file. This is particularly useful for developers with lots of different kernel images floating around; it prevents accidentally running a kernel with the wrong initrd (in particular one containing kernel modules compiled for a different kernel version).

This approach cleanly isolates the kernel from the details of the rootfs and /usr partitions - and make it possible to support things like network-mounting, encrypted-root, root-on-lvm and other interesting combinations with a totally standard kernel that has no drivers at all statically linked in. Of course, this relies on the initramfs image being correctly populated with the necessary tools/drivers, but that is a simpler task than recompiling the kernel. Creating an initramfs is reasonably simple to do by hand (just create a directory with the desired tools and then create a cpio archive from it), but various tools exist to automate the process.

Kernels for embedded systems typically link all drivers statically, so for such systems an initramfs may not be necessary. However for “desktop” systems, there are advantages (and no real disadvantages) to the initramfs approach; the system does need enough ram to hold the entire initramfs contents in memory but this is freed as soon as the real root is mounted.

The (traditional) alternative was to compile most drivers as modules, but make a few static - the ones needed to mount the root filesystem at least. This then implies that the same kernel would no longer be capable of booting if the root filesystem was moved to a different technology.

If the kernel has compiled-in drivers which need firmware, then an initramfs is also required. Initialising of compiled-in drivers will be done before the normal rootfs is mounted, but device initialisation is done earlier and drivers needing firmware will try to load it from /lib/firmware which doesn’t yet exist. An initramfs with the needed firmware resolves this issue - it is available and mounted before device-driver initialisation. This even supports holding the real rootfs on a device that requires firmware (including having the rootfs network-mounted where network access requires firmware).

Note that “initrd” files are an old/obsolete form of initramfs (contain a complete ext2/ext4/etc filesystem image, thus requiring the relevant filesystem driver to be statically linked into the kernel). For historical reasons, initramfs-format files are usually stored as name /boot/initrd.img-{version}, even though they are not initrd-format at all. It doesn’t matter; the kernel auto-detects the format of the init-image passed to it by the bootloader.

See:

dracut - redhat initramfs builder
yaird - debian initramfs builder
mkinitrd?

Appendix: LFS

While Linux From Scratch sets up a traditional “split” system, I simply converted the result to a merged form and it works fine. After completing the initial LFS book (using just one partition for everything), I:

moved all files from /bin to /usr/bin
did the same for /sbin and /lib
created symlinks /bin->/usr/bin etc

Of course if done on a running system this must be done with great care in exactly the right order or the commandline will break. I did see a script somewhere which does this with bind-mounts and rename calls that can actually be executed from within a running system.

I’m pretty sure that if these links are created right at the start (LFS chapter 2), then things would also work fine.

And if you do have a separate usr partition then an initramfs will be needed - see the BLFS book. However, as I noted earlier, a separate usr filesystem doesn’t bring any benefits for “normal” systems.

Personally I think it would be a good idea for the LFS book to use the “unified usr” approach - at least for the systemd variant. As noted by the above article, in a traditional system (in which usr may be mounted late), all udev scripts and init-scripts must be on the rootfs. From a quick look at the directory structure, it appears that LFS does get udev right (ie there are no deps from LFS-installed rules-files on /usr) - although there is always the danger that something in BLFS or other software will install additional broken udev rules.

Note that the “check command” recommended by this article reports use of *_FROM_DATABASE in udev config files as an error, but that is not a problem on LFS: the hwdb is safely installed under /lib.

Checking the systemd .service files (or the sysv init scripts) is trickier. Whether a particular entry can safely reference usr or not depends on which “target” it is part of. However I cannot find any problems in the standard config files installed by lfs+systemd.

A unified-usr approach would save a large number of “mv” and “ln” commands. Simply creating links /bin->/usr/bin, /sbin->/usr/sbin and /lib->/usr/lib right at the start should be sufficient (untested!). Possibly the commands in the book “sourcecode” could have markup to allow a “systemd-unified” book variant to be generated, ie which simply omitted the marked-up commands from the generated book?

LFS section 2.2 “Creating a New Partition” recommends not creating a separate usr partition, in which case an initramfs is not needed, and the unified hierarchy could be used without an initrd.

A separate /usr partition is generally used if providing a server for a thin client or diskless workstation. It is normally not needed for LFS..

If the / and /usr directories are on the same partition, then keeping “critical” executables out of /usr is really irrelevant. However if the system being build uses separate partitions for / and /usr, and a unified /usr is wanted, then an initramfs is mandatory. The current LFS instructions do not include creating an initramfs but the BLFS book does cover this.

The LFS instructions do not include creating an initramfs (a filesystem-in-a-file which is mounted by the kernel before the partition on which the root filesystem is stored is available); LFS instead statically links the relevant drivers into the kernel. The make defconfig command in section 8.3.1 (“Installation of the kernel”) will configure ext4 to be built statically; grep EXT4 /boot/config-3.19 should show a y (static) rather than an m (module). As the root filesystem is a simple partition (not encrypted, no LVM, not remote, etc) this is all that is needed.

References

The Case for the Usr Merge - the original posting
separate-usr is broken
Why everyone must oppose … - Rusty’s very sarcastic opinion on the debate
LWN: The usr merge - LWN discussion on the topic
BLFS on initramfs for basic instructions on how to create an initramfs image.
IBM: ramdisks
grml-rescueboot
LWN: eudev

About

Recent Posts

Categories

The Unified Usr Approach