Grub2

Categories: Linux

Booting Linux on x86 using Grub2

Updated 2010-04-18. Thanks to Seth, Vladimir for corrections.

Updated 2010-06-07. Thanks to Tom H. for information about load_env and save_env. Also added information about installing core.img into unformatted partitions, inspired by a question from Tom Coles.

Table of Contents

Introduction

There is a lot of information on the web about how to use Grub2. The Grub2 site also has some very detailed info about how it works. However there appears to be no decent 15-minute overview of how an x86 PC boots a Linux kernel using Grub2; this article aims to provide that.

Like any other overview, there are many details that have been left out; a complete description of Grub2 would result in a textbook, not a quick introduction. For further information, please see the Grub2 site.

This article is NOT a how-to or reference manual for Grub2; it is for people who have read those docs and asked “but why does that work?”.

I hope you find this useful; all feedback welcome via email.

Intended Audience

This article is for:

  • People who want to understand the general principles of Grub2, so the commands needed to configure booting then make sense, rather than being simply magic.

  • People wanting to get into Grub2 development, as a first step before diving into the code 1.

This article is not for:

  • People who want to solve problem X as fast as possible, and just want some instructions to follow. This is not a reference manual on Grub commands, nor a problem-solving FAQ.

It is expected that:

  • The reader is comfortable using the command-line to administer their computer, and knows what partitions and filesystems are.

It is not expected that:

  • The reader is a programmer.

Environment and Assumptions

This information is based on Grub version 1.98, and Ubuntu 10.04 beta-1. It should apply to most versions of Grub around 2.x, and most Linux operating systems.

It is assumed that:

  • The computer is an x86 pc. For other computer types, maybe 50% of the information below will still be relevant.

  • The computer has a traditional BIOS. For systems with coreboot firmware rather than BIOS the early stages are different but maybe 80% of the information below will still be relevant. For systems with EFI firmware, again the early stages are different but the remainder is still relevant.

  • The boot media is a local hard-drive. This article does not cover network booting, ie where the Grub code is itself on a network filesystem.

  • The boot disk has been formatted with an MBR Partition Table, ie the traditional format inherited from DOS days.

    Systems whose ROM supports EFI will have a GPT Partition table, and will use a somewhat different approach for the early parts of booting. Later parts (after core.img has been loaded) should be the same.

    See the Wikipedia Master Boot Record article.

    It is possible to use GRUB to boot a BIOS-based x86 PC with a GPT formatted hard-drive; this is briefly discussed later.

  • Grub’s files are installed on the same partition as the root filesystem of the Linux installation being booted, under directory /boot.

    It is also quite common for Grub’s files to be put on their own partition, and this makes only a small difference to the process described below. However in the name of simplicity, this article always writes /boot/grub and those who have a separate partition will have to mentally replace this with /grub while reading.

  • The operating system being booted is Linux. Actually, this article should be applicable to booting just about any OS, as it stops at the point where control is handed over to the target OS kernel.

Article Flow

The remainder of this article is broken into the following sections:

  • What happens when Grub is installed
  • What happens when the host computer boots
  • More about loadable Grub modules (chunks of executable code).
  • What happens when the “grub-mkconfig”, aka “update-grub” script is run (ie how grub.cfg gets generated)
  • Assorted other issues

Installing Grub

The Grub utilities provide a command “grub-install” which creates the files in /boot/grub and writes a program to a disk’s Master Boot Record (MBR).

When grub-install is run, the following occurs:

  • Script grub-install copies module (*.mod) files from some base location into /boot/grub. The base location is usually /usr/lib/grub/i386. See the grub-install script for further information.

  • Various *.lst files are also copied literally from the base location into /boot/grub:

    • moddep.lst contains a list of the modules that each module depends on. When a module is loaded, the dependent modules are also automatically loaded.

    • fs.lst indicates which modules implements drivers for which filesystem-types.

    • command.lst indicates which modules implement which “commands” that can be invoked from the Grub command-line or Grub shell scripts (eg grub.cfg).

    • handler.lst indicates which modules provide which “parsers” (ie different Grub shell interpreters). The parser named “parser.grub” is the default parser.

    • crypto.lst indicates which modules provide which encryption algorithms.

    • partmap.lst indicates which modules provide support for which disk partition schemes. The support is read-only, ie just being able to determine what partitions exist, what the (start, size) values are for each partition, etc. Tools for modifying partitions can be found in the modules listed in file parttool.lst.

    • video.lst indicates which modules provide a graphics driver. When using gfxterm or gfxmenu, the video.mod module is loaded, and it then tries each module in video.lst in turn until one of them returns “success”, ie detects that it can handle the graphics on this computer.

    • parttool.lst indicates which modules provide tools for modifying disk partitions, eg marking partitions as bootable/hidden.

    • terminal.lst indicates which modules provide “terminal” implementations, ie code that can display text and read input from the user.

  • A core.img file is dynamically generated 2. The generated file contains the basic Grub code, plus one or more linked-in modules, including the filesystem driver module needed to read the filesystem on which the Grub files are stored (/boot/grub) 3. Note that these built-in modules are also available in /boot/grub, under their normal names (eg “ext2.mod”).

  • Application grub-setup (invoked by grub-install) copies the contents of the core.img file to the sectors immediately following the MBR on the target disk, also known as “the post-MBR gap”. Installing in this way is named embedding in GRUB terminology.

    In most linux systems, hard drives get partitioned so that the first 32kb following the MBR are not part of any partition; this space is termed the “post-MBR gap”. Presumably other unixish OSes do this too (windows is a little different 4). This “unmanaged” space is exactly the right sort of disk-space to put boot code that must be loaded by MBR startup code that has no support for filesystems.

    The grub-setup app will look for a post-MBR gap by default; if the gap is not there then grub-setup will issue a warning. Alternatively, grub-setup can be pointed at a normal hard-disk partition with a null “type” field; this is also “unmanaged” space that is suitable for boot code. This is particularly useful if the target hard-drive already has data on it, and doesn’t have a post-MBR gap; in this case any free space on the disk can be used. It also allows any size core.img file to be supported (ie files bigger than the traditional 32kb post-MBR gap size). See the options for the grub-setup application for further details.

    Note: To manually check if the necessary gap is there, just use parted or gparted on Linux to inspect the partition table of any disk; the “start address” of partition 1 should be at 32Kb, ie there is a “missing” area not in any partition.

    Before writing to disk, core.img is modified to insert hard-wired values for the disk# and partition# of the filesystem on which /boot/grub exists, ie which it should mount using the linked-in filesystem driver. Hard-wiring these values can be bad if storage devices are added to, or removed from, the system. This hard-wiring can be avoided by linking in the “search” module and a “config file”, making it possible to locate at boot time the partition to mount via label or UUID 5.

    File core.img is also modified so that its first sector (512 bytes) contains the addresses of all the other blocks of the file. This allows the Master Boot Record code to read this single sector and then load the rest of the file with simple direct read operations (no filesystem understanding needed).

    The alternative to “embedding” the core.img (ie copying it to space outside any filesystem) is using “blocklists” to load it from the original stored within a filesystem. As described above, the core.img file is modified so its first sector holds the raw addresses of the disk blocks containing the rest of the file 6. However the file is not copied; instead, it is left in its original position. To the Master Boot Record code, the effect is the same. However this approach is not recommended as it is much more vulnerable to breakage; GRUB will fail to boot if the original file gets moved (eg if a “defrag” is done, or restore-from-backup done, or the file is made larger, or even simply modified in the case of log-structured filesystems).

  • An MBR image is generated (aka boot.img), and grub-setup writes it to the MBR area of the computer’s boot device. This generated image contains code to load a single hard-wired sector from disk, which itself contains an array of (sector,len) address ranges. The MBR code then iterates over this array loading all those sector ranges into memory. The single sector to load is the address of the first block in the core.img file, effectively loading file core.img from disk. This MBR code is very architecture-specific; the PC one is written in assembly and invokes BIOS operations to load the specified blocks. Note that the MBR code does not need to understand the filesystem it is loading the blocks from; they are simply at fixed addresses 7. Note also that the MBR is a fixed size: just 512 bytes on PCs - and it must include the disk partition table too!

    The MBR can also contain “bios params” between offsets 0x03 and 0x5a, is about 30% of this space, further reducing the space for program code.

    A copy of the partition table entries from the old MBR are inserted into the new MBR before it is written, so the disk partition information is not lost.

    See: Wikipedia Master Boot Record

    It is possible to install Grub’s MBR code to the first sector of a partition (rather than the first sector of the entire disk); many bootloaders support “chain loading” where the boot code in the disk MBR forwards to code in the first sector of a partition. However in order for this to work, the filesystem on that partition must not use this first sector! Many filesystems do explicitly skip the first sector for just this reason; for example, fat, ntfs, and ext2 all leave the first sector free. When installing into a partition (eg hd0,1 rather than hd0), Grub checks the filesystem type in that partition; if the type is not one that is known to leave the first sector free then a warning will be displayed and grub-setup will exit. Even if the first sector is free, the necessary space to “embed” core.img will not be available, so a blocklist must be used to load core.img from its “in-place” location, which can be fragile.

As part of installing Grub, a /boot/grub/grub.cfg file is also needed. See section “Generating A Grub Config File”.

Booting the Computer

When power is turned on, the following happens:

  • The hardware initialises, sets the CPU to real mode (no virtual memory) and jumps to fixed location 0xFFFF0 (hardwired in the CPU circuits)

  • BIOS code stored in a ROM or flash-memory mapped to that location is therefore executed.

  • The BIOS code looks at the BIOS config data to see which is the boot device. This BIOS config data can usually be edited by pressing some special key-sequence just after turning the power on, causing the BIOS configuration program to run. Among other things, the boot device can usually be selected here.

  • The BIOS code loads the MBR of the boot device into RAM. Remember that an MBR is just 512 bytes! The loaded data is of course the program & data that grub-install dynamically created and wrote there when the grub-install program was executed.

  • The BIOS code jumps to the start address of the loaded MBR (ie Grub code executes for the first time since power-on).

  • Grub’s MBR code loads a single sector whose address is hard-wired into the MBR block. It then loops over the (address,len) pairs in that sector loading all that data from the disk into memory (ie loads the contents of file /boot/grub/core.img, or its “embedded” copy). The MBR code then jumps to the loaded code, ie “executes” the program in core.img.

    As described in the “Installing Grub” section, this trick of embedding the raw disk block addresses makes it possible to store core.img in space that is not in a partition, and that has never been formatted as a filesystem at all (“embedding”). And in this case, if core.img is modified, as long as the new version is “embedded” at the same location, the MBR code does not need to be updated.

    Alternatively, it is possible for the core.img to be inside a real filesystem, and for Grub to read the core.img file contents without having a driver for that filesystem. However in this case, if core.img is modified then the first block of the file may well be given a new address on disk; if this happens then the MBR must be updated to point to this new location. Nevertheless, as core.img is usually updated by running grub-install, this is not usually a problem.

    Note that theoretically, if core.img is on a different device than the MBR, and new hardware is added then the Grub-generated MBR record might not be able to correctly load the core.img file; the device-id on which the first sector of core.img is to be found is hard-wired into the MBR, not searched for. However there is no solution for this; there is no way to embed the equivalent of the Grub “search” command into the 512-byte MBR. This problem is not likely though; normally the core.img is embedded on the same device as the MBR. And once core.img has been loaded it can use search.mod to find all further /boot/grub files, and is therefore immune to hardware rearrangements.

  • The executed core.img code now initialises all the modules that are built into it (linked into core.img); one of these modules will be a filesystem driver capable of reading the filesystem on which directory /boot/grub lives.

    It also registers a set of built-in commands: set, unset, ls, insmod.

  • If a “config file” has been linked into core.img, this is then passed to a very simple build-in script parser for processing. Scripting commands in the config file can only invoke built-in or linked-in commands. Simple scenarios (eg booting a typical desktop computer from a local drive) need no config file; this facility is used for things like booting via pxe, remote nfs or when /boot/grub is on an LVM device.

  • Core.img now loads file “/boot/grub/normal.mod” dynamically from disk, and jumps to its entry function. Note that this step requires the appropriate filesystem driver to be set up (ie built-in).

    The process of dynamically loading modules is discussed in the next section.

  • The normal.mod module:

    1. Registers command “menuentry” for use by scripts 8.
    2. Loads a script parser (aka an “interpreter”) named “parser.grub”. The handler.lst file is used to determine the name of the *.mod file to actually load (normally “sh.mod”).
    3. Loads file /boot/grub/grub.cfg and passes the contents to the script parser for execution.
    4. Puts up a menu on the screen showing an entry for each time the script parser had encountered a menuentry{...} block in the script (if no menuentry blocks exist in the script, then it just displays an interactive commandline).
    5. Lets the user select an option.
    6. Passes the contents of the {...} block for that menu-entry to the script parser again.
    7. Invokes the “boot” command from the boot.mod module.

The standard script parser implementation is named “parser.grub” and can be found in file /boot/grub/sh.mod. This implementation provides features very similar to a standard posix shell. Therefore grub.cfg’s syntax is very similar to a standard Unix shell script:

  • Environment variables are accessed via $name or ${name}.
  • Any of the commands provided by the *.mod files can be invoked like a shell script would invoke an external app or built-in command (eg “ps” or “ls”). If a command is issued which is not provided by a currently-loaded module, then file “command.lst” is used to determine which *.mod file provides that command, and that module is then loaded. See the following section for more details.
  • If/then/else/fi is supported as in normal shell scripts.
  • Script functions can be defined and invoked as in normal shell scripts.
  • Variables can be defined as in normal shell scripts.
  • Additional modules can be explicitly loaded via “insmod” commands, eg “insmod sleep.mod”. This is not necessary to access “commands”, as they are automatically loaded (see above). However insmod is needed to load modules providing filesystem drivers, parsers, crypto implementations, etc.

Module normal.mod also provides an interactive command-line with gnu readline support. Each line read from the keyboard is passed to the current script-parser for processing. The result is that Grub can provide an environment very similar to a standard unix interactive shell, where the commands that can be invoked are anything provided in the available modules. This is useful for diagnosing and fixing boot problems.

The list of available commands is documented on the Grub website. Any of these can be invoked from grub.cfg as long as the associated module has been loaded first, or if there is an entry in file command.lst for it.

The commands most useful to call from grub.cfg are:

command description
search find (hdx,y) value for a filesystem with specific label, UUID, etc
terminal configure a textmode terminal
gfxterm configure a graphical terminal
recordfail save state for next boot
echo write to terminal
test interprets [ … ] expressions
linux sets up Grub data-structures so that later execution of the “boot” command will boot into a specified Linux kernel.
chainloader load and jump to the MBR of a different disk partition

Some commands useful to call from an interactive commandline are:

  • ls, cat, echo, hexdump, parttool, tar, reboot

Module Loading

Grub modules have some similarity to linux kernel modules, and in fact the commands to load and unload them into Grub are “insmod” and “rmmod” just as for linux modules. However in many respects they are more like “plugins” for an IDE or editor. It is this modular architecture, together with a built-in scripting syntax for invoking module functions, that makes Grub so flexible and extendable.

Initialization

When a module is loaded (from a *.mod file), or a linked-in module is initialised, the module can register commands, variables, parsers, filesystem drivers and cryptographic algorithm implementations.

A command is simply a mapping from a name to a function; Grub scripts can then invoke that function by name (eg “ls”, “boot”, “linux”). Often the command name has the same name as the .mod file (eg “search.mod” registers command “search”). However this is not always the case; for example module “loadenv.mod” registers both load_env and save_env commands. If a script uses a command that is not yet registered, then file commands.lst is used to determine which module to load.

A variable is a mapping from a name to a variable defined in the module; scripts can then access the variable as $name. Optionally, a module can ensure it gets a callback when a script assigns to that variable.

A module can register a “parser” to support additional non-standard script syntaxes (eg lua). If any script (eg grub.cfg or something loaded from grub.cfg) starts with “#!parserid” then the specified parser function will be used to process that script. If there is no parser function registered with that id, then normal.mod looks in handler.lst for a line mapping key “parser.someparserid” to the name of a .mod file to load, ie autoloading works for parsers as well as commands.

A module can register filesystem drivers (eg ext2, reiserfs, befs, ntfs). Note that at least one filesystem driver module is embedded in core.img so the files in directory /boot/grub can be read. Filesystem drivers cannot be auto-loaded; the “insmod” command must be used to explicitly load the necessary modules. The filesystem driver modules for Grub support reading only; Grub does not need to write data back to the filesystem during boot, and skipping write support makes the modules a lot smaller and easier to implement.

A module can also register cryptographic algorithms. As for filesystem drivers, they must be explicitly loaded via insmod before being available.

Linking

Module (.mod) files are actually ELF format files; core.img contains the necessary code to parse the ELF headers, load the code and do dynamic-linking (ie resolve calls to functions and variables exported by the core.img for use by modules). The core.img exports lots of things to modules including functions for manipulating strings, iterating over disks, registering themselves, etc.

Dependency Handling

File “/boot/grub/moddep.lst” contains a list of inter-module dependencies; when a module is loaded, its dependencies are also automatically loaded.

Note that module “normal.mod” has dependencies, as declared in moddep.lst. Therefore when core.img loads normal.mod during startup those declared dependencies are also automatically loaded. One of the more significant ones is “boot.mod”, which provides the “boot” command that is automatically executed after a menuentry{...} block has been selected by the user, or ctrl-x has been pressed after editing a menuentry.

Loading

A module can be:

  • built-in to core.img (see grub-mkimage, called from grub-install),
  • dynamically loaded using “insmod” from a script,
  • loaded because a module that depends on it was loaded (as declared in moddep.lst)

Module “normal.mod” is a special exception and is automatically dynamically loaded by core.img. Module “sh.mod” is also a special exception, and is automatically dynamically loaded by “normal.mod” (via the mapping for parser-id “parser.grub” in file handler.lst).

Generating A Grub Config File

Normally, during boot Grub will load a file /boot/grub/grub.cfg which is a shell-script-like file that then configures the standard Grub boot menus.

This file can be written by hand if desired; the syntax is fairly easy for anyone familiar with shell scripts. However Grub also comes with a convenient shell script called “grub-mkconfig” which uses a set of helper scripts and templates to create a suitable grub.cfg file automatically. As most Linux distributions automatically invoke grub-mkconfig after updating kernel versions etc, causing any hand-written config file to be overwritten, it is generally better to customise the “templates” rather than modify grub.cfg directly.

On some Linux distros (eg Ubuntu) the name “update-grub” is used as an alias for grub-mkconfig.

Note that grub-mkconfig simply regenerates the /boot/grub/grub.cfg shell script that Grub interprets on boot. It does nothing else, as the MBR + core.img + modules are enough to get the interpreter up and running scripts out of normal filesystem files.

Script “grub-mkconfig” reads all files in /etc/grub.d, executing each in order. Each should be a shell script whose output ends up in the grub.cfg file. To add a custom entry just modify file /etc/grub.d/40_custom which is explicitly designed as a template for custom entries. And if you want your custom entries to appear before the standard boot entries in the list, then rename file 40_custom to 08_custom or similar.

You may not even need to customize anything; Grub’s default scripts automatically create menu entries for each vmlinuz-* file that exists in /boot.

All of the functionality of grub-mkconfig (aka update-grub) is implemented in shell script, so the best reference for this topic is the script files themselves. In addition, this configuration process is fairly well-documented elsewhere, so there is no need to repeat that information here.

Referencing drive partitions from grub.cfg

Grub understands only ‘(hdx,y)’ type expressions natively, where x is the physical disk device (0..n), and y is the partition. In Grub 1.98, y is just a number in range (1..n); in later versions of Grub, it also specifies the partition type, eg ‘(hd0,msdos1)’. And in complicated configurations, it can even look like ‘(hd0,msdos2,sunpc2)’, which specifies a particular solaris ‘slice’ embedded in msdos partition #2 on drive #0. As this article is specifically about Grub 1.98, the rest of this article will use the ‘(hd0,1)’ format in examples.

Files can be referenced via paths like “(hd0,1)/boot/mykernel”. If no device-id is present on the front, then “magic” shell variable $root is used by default.

The ‘search’ module can be used to search all partitions for a filesystem that matches a certain criteria, and setting a shell variable to the matching ‘hdx,y’ value. That variable can then be used later to reference the matched filesystem. Using the search command is not necessary if you are happy to reference filesystems by their raw ‘(hdx,y)’ descriptions.

By default, the “search” command stores its result in environment variable $root, ie it is intended for searching for the device on which /boot/grub/*.mod files live.

It can be used to search for devices for other purposes, but in these cases an alternative environment variable should always be specified, eg > search.label mylabel MYDEV

The “search” command can invoke any of the more specific search.* commands, so this command is equivalent to the one above: > search –set=MYDEV –label mylabel

The returned value is of form ‘hdx,y’. Note that there are no parentheses; if you wish to later use this value as part of a file-path, then you need to add the parentheses, eg: > somecommand ($MYDEV)…

Note that if you set $root to point to something other than the device on which Grub’s *.mod files live, then Grub will no longer be able to load any modules.

Note also that $root has absolutely nothing to do with the “root=” value passed to a Linux kernel. However the kernel’s image file (first parameter to the “linux” command) will be looked for on the $root device by default, ie

linux /mykernel root=xyz

means

linux ($root)/mykernel root=xyz

which means something like

linux (hd0,1)/mykernel root=xyz

The “linux” module

This module loads a kernel into memory and sets up Grub data-structures so that a later call to the “boot” command will jump to the loaded kernel.

The module registers two commands that can be called from scripts:

  • linux
  • initrd

The “linux” command

  • Expects its first argument to be the path to the kernel image itself.
  • Simply passes all other arguments through to the invoked kernel.
  • The specified kernel is loaded into memory by this command.

Example:

linux /bzImage-custom root=/dev/sda5 
   rootfstype=ext4 ro 
   crashkernel=384M-2G:64M,2G-:128M 
   quiet splash

The “initrd” command

  • Optional if the kernel image to boot does not need any modules, ie if it has been compiled with the necessary modules for bootup compiled-in.
  • Otherwise, has one param that is the name of the disk image to use as an initial root filesystem. the specified file is loaded into memory at a suitable address, and the previously loaded kernel image is modified to set the address of the loaded disk image.

Hint 1

If your root filesystem is ext4, then you need rootfstype to be specified, as Linux itself will try to “guess” the filesystem, and will try ext3 first which will recognise the filesystem as ext3 and then fail to mount it with message “unsupported features”.

Hint 2

If no initrd is specified, then the “root” param to the linux command must not use the “root=LABEL=xxxx” or “root=UUID=xxxx” format. Those need udev up and running for the kernel to correctly map the label or UUID to a filesystem (udev creates directories /dev/disk/by-uuid and /dev/disk/by-label), but udev is not available if there is no initrd.

The solution is to just use a device-id like “linux root=/dev/sda5 …” 9.

Multiboot kernels

Grub2 boots kernels by invoking an appropriate module. There are several operating-system-specific modules (eg “linux”, “xnu”, “bsd”). There is also a “multiboot” module that is able to boot any multiboot-compliant operating system.

The original multiboot specification has some problems; at the current date work is in progress on a new multiboot design specification.

Using GPT partition tables

The traditional way of handling disk partitions on x86 pcs is the “msdos partition table”. The first sector of a disk (ie first 512 bytes) contains a “master boot record”, and the end of the MBR contains 4 “partition descriptors” of 16 bytes each. These specify the start and end of up to 4 “primary partitions”. One primary partition can actually be an “extended” partition, in which case it starts with the first record of a linked-list of “extended boot records” which can be used to define any number of additional “logical partitions”.

The old partition table design has many flaws, particularly that the start/size fields are specified in a way that limits the values to 2TB.

A new partition format called GPT has been defined which solves the msdos partition table problems. It is supported in the firmware of most non-PC architectures already (eg Apple), and by very modern x86 architectures that have “EFI” firmware rather than BIOS.

In EFI-enabled systems, booting with GPT disks is “normal”, and Grub works fine. There is no MBR code in such systems; the firmware itself is capable of mounting a specified partition which is in a FAT32-like format, reading a bootable image out of that filesystem and executing it. The same partition can of course be mounted from the booted OS in order to modify boot code, so the whole “stage 1” 10 problem goes away.

In BIOS-based systems it is still possible to boot a GPT-formatted disk, but there are a few minor quirks.

One quirk is that this magic “32kb gap” between sector 1 and the first partition that is created by convention for msdos-partitioned disks does not exist in GPT. The solution is to create a specific partition to hold the “embedded” copy of core.img; this partition must have type BIOS_BOOT. The “grub-setup” utility (called by grub-install) searches the GPT for the first partition of that type and writes core.img there.

The GPT specification does reserve the first sector of each disk for backwards compatibility, so we can still write a standard GRUB “stage 1” MBR into this sector, and a standard BIOS will still read this sector into memory and execute it. And the core.img file can still have its first block dynamically generated to hold the addresses of all blocks in the file … well, one (address, len) pair. The Grub MBR code will then loop using the BIOS int13 API to load that first core.img block, then use that to load the rest of the core.img file, and then execute the loaded data. At this point, both the EFI and BIOS booting processes have accomplished the same task (loading core.img), and booting continues in the same way. The EFI approach is simpler, however, in that the special setup processes needed to create the MBR code, to write core.img as “raw data” to the disk, and to prefix core.img with a “blocklist” are not needed. Of course the EFI firmware is correspondingly more complex.

Of course the booted operating system must also understand the GPT partitioning scheme, or it will not be able to mount any filesystems. Linux has understood these for many years (CONFIG_EFI_PARTITION is enabled by default). Windows 64-bit versions also do, but Windows 32-bit versions do not, so you cannot dual-boot win32 on a disk that uses GPT. You can start up win32 as a virtual machine under Linux however; VMs do not see real disks, just virtual ones that are really standard files in the host operating system’s filesystem. These “fake disks” have their own partition tables which can be formatted in the traditional style.

Gnu parted 1.8.8 (and gparted) is only partially GPT-enabled. It can create GPT partition tables, and partitions within it. However it still asks for “primary/extended/logical” as a type, which makes no sense for GPT. And it does not allow setting of GPT partition names, or display their GUID values. The “gdisk” tool should be used instead.

As with msdos partitions, the partition-type field indicates the format of data within that partition (but in GPT, the type field is a GUID, not an 8-bit value). The GUID for “EFI System” tells EFI firmware that it can mount that partition as a FAT filesystem (which it has built-in firmware for), and that there should be a file with a magic name that it can load and execute to boot the system. Note that this means that the core.img file cannot be on the filesystem of the kernel to be booted (as no-one sane would put a whole OS on a FAT filesystem!). Instead, either core.img goes into a partition on its own (lonely), or core.img plus *.mod, *.lst, and possibly grub.cfg go into the EFI partition.

Backgrounds and Fancy Graphics

The vbe.mod file provides drivers for standard VESA BIOS Extentions 2.0 graphics; other graphics cards can potentially have their own .mod files.

The video.mod file finds and manages one graphics driver.

The gfxterm module:

  • Registers itself as a “terminal handler” on load
  • Registers one command background_image for use by scripts.
  • Looks for shell variable $gfxmode to determine the resolution to use
  • Uses module video.mod to set the screen to that resolution; this in turn probes all loaded video modules until a suitable one is found (which will be vbe.mod on most PC systems).
  • Loads a font (“/boot/grub/unicode.pf2” is currently the only one).

After this process is finished, any text that is displayed by Grub will be handled by gfxterm (as the current “terminal handler”), and rendered in the specified resolution with the loaded font.

The grub.cfg script can optionally invoke command background_image to set a graphics file to use as the background for the Grub menu. This command takes one option and one argument; the argument is the filename. The option is: > -m (stretch|normal)

The graphics file formats are (surprisingly) not defined in modules, but instead built in to the gfxterm module. Supported formats are .tga, .png, .jpg.

As an alternative to gfxterm, the gfxmenu module can display the same grub.cfg menu content in prettier form. It is not quite ready for production use yet though.

Miscellaneous Topics

Rescue Mode

Grub has a “rescue” mode, where core.img starts up, but “normal.mod” is not loaded. In this case, there is a commandline and a very primitive parser available.

About the only thing the parser can do is ‘insmod’ and execute loaded commands. To mimic the normal boot process from here:

set prefix=(hdx,y)/boot/grub
insmod normal
normal

The Grub Emulator

There is something called “grub-emu” which is not used in booting. It is a tool that can be run after booting Linux; it emulates the behaviour of the commandline that you get when booting and going into a Grub interactive shell.

It allows you to experiment in order to see what kinds of Grub commands you might want to put into a grub.cfg file. Note that this is an emulator, and therefore does not do things 100% the same as a “real” Grub environment.

It can also be used for development/testing purposes.

The device.map File

The “/boot/grub/device.map” file specifies a mapping from BIOS device ide (hdx,y) and Linux device names (/dev/sdx). This file is usually auto-generated by grub-install, with the Linux device names being taken from the OS on which grub-install was run. However users can provide a hand-written one if Grub’s guesses are not correct. This file is used in the following situations:

  • by “the grub shell”, ie the interactive Grub emulation environment (which is not used for booting a computer).
  • by grub-setup (which is called from grub-install)
  • ?? when running grub-mkconfig, ie creating the grub.cfg file?

Note that because the mappings are appropriate only from the OS on which grub-install was run, this file of course cannot be used during booting; it is only used to figure out what (hdx,y) values to write into grub.cfg in order to reference files accessed by the grub-install script on /dev/sdx…

PXE Boot

Sometimes it is useful for a computer to boot off a disk available on a network rather than a local one. Most modern BIOSes support this; on start they just send out a “boot request” message, get back a suitable address, download a single file and execute it.

This downloaded file is the equivalent of the “core.img” file used for local booting. In fact it is built just like a core.img is for local booting, with file “pxeboot.img” inserted on the front, then deployed onto a suitable tftp server.

The pxe code effectively replaces the MBR’s process of loading core.img from a “blocklist”; everything else works as described elsewhere in this article for a local boot. Of course if core.img has been loaded from a remote system, then probably the rest of the /boot/grub files will also be loaded from that remote server, so core.img needs a network driver linked in (“pxe.mod”). This module then provides a “filesystem driver” (like insmod ext2) except that the retrieved data comes from a tftpserver rather than a local harddrive.

The grubenv file and the save_env/load_env commands

A file “grubenv” is created during the normal grub install process. This is a fixed-size file of exactly one disk block.

The save_env command can then be used from a grub.cfg script to write the value of a grub script variable into this file. Because the file is fixed-size and only one block, no filesystem support is needed to write data into it; the plain BIOS facilities are sufficient.

The load_env command can be used to read data back out of it into grub script variables.

The main purpose of this facility is for grub scripts to be able to record the user’s most recent menu option choice, so on next boot the same menu option can be executed as the default (ie the one run when no user input occurs). This feature is currently disabled by default; to enable, edit /etc/default/grub and set the GRUB_DEFAULT variable to “saved” and regenerate the grub.cfg file. The new grub.cfg file will then call save_env at appropriate times.

An alternate use is to support the “grub-reboot” command. This writes data into the grubenv file, and the standard grub.cfg script will then see this data after reboot and automatically run the specified menu option. Try this:

  cat /boot/grub/grubenv
  grub-reboot testing
  cat /boot/grub/grubenv

References

Working notes

  • minor bugs in /etc/grub.d/10_linux

    1. “loading initial ramdisk” echoed even when no initrd will be loaded

    2. for some reason, “Loading Linux “ message does not end up in grub.cfg. Maybe a syntax error in the $(printf “…” command? or no gettext translation?

    3. looks only for vmlinux or vmlinuz, not for bzImage-*

  • the “search” command before the “linux somekernel” command is always output, but not always needed. Yes it is needed if the kernel is not on the same partition as /boot/grub, but otherwise we do a by-UUID search for no purpose. On the other hand, the search is probably just using cached data about the existing disks…

    Perhaps search might also be needed if some other command has modified $root?

Outstanding questions

Here are some things I am still trying to figure out…

  • what do load_env and save_env do? I think they access that funny fixed-size “/boot/grub/grubenv” file. But why would you want to save stuff in there?

  • what is grldr.img?

  • are there device drivers designed to do wear-levelling on flash at the OS level (ie manage block remapping inside the driver)? If so, these devices would be a bad place to put Grub’s core.img!

  • how does Grub localisation work?

  • what does linux kernel config option CONFIG_EDD do?

Footnotes

  1. Grub’s code is not too scary to work on. It is well structured (very modular), not very large, and 99.5% standard “c”. Yes, it does run in “real” mode, which is something a little unusual for Unix developers, but so was every DOS program ever written. Debugging may be trickier than usual, but for many tasks the new code can be run in an “emulated” Grub environment as a normal unix process (grub-emu). The qemu environment can also be used. And the whole of Grub is single-threaded, so there are no synchronization issues or races to worry about.

  2. The grub-mkimage binary app is what dynamically creates core.img. The command-line args to grub-mkimage list the modules to build in to core.img (together with the “core” Grub code). For example:

       grub-mkimage -o myimg.img search
    

    will build the “search.mod” module into the generated myimg.img file - plus all the modules that search has dependencies on. Usually grub-mkimage is invoked by the “grub-install” script which is responsible for figuring out what modules are needed for the current environment.

    It is important that core.img not be too large; when embedding it into the traditional “post-MBR” gap between sector 1 (the MBR) of a disk and the start of the first partition. Most partitioning tools (including Windows and Linux) leave 32kb space here, so that is the max size for core.img in a “traditional” installation.

    There are potential workarounds if a larger core.img is absolutely necessary. For example, a raw (unformatted) partition can be created to hold core.img; the grub-setup tool just needs to be given the id of that partition as its target. An alternative is to leave the core.img file within the real filesystem that also holds /boot/grub; however this is more fragile than the “embedding” approach, as the file might accidentally get moved on disk, with ugly consequences!

  3. An ext4 filesystem can generally be mounted as a read-only ext2 filesystem. So to boot off an ext4 filesystem, only an (enhanced) ext2 driver is needed, as the boot partition is read-only (except possibly for the save-env command, but that writes only to a single block in a pre-allocated file).

  4. WARNING: the following info is best-guess only; I’m no expert on Windows!

    MS-Windows without EFI (ie pre windows-8) doesn’t support the “post-MBR gap” approach. Instead, it has the concept of a “system partition” and a “boot partition”. A “system partition” has its own MBR-like structure in the first sector or the partition; the remainder of the partition is formatted as a filesystem. The assembly code present in the master MBR of a disk normally scans the disk partition-table to find a partition marked as ‘active’ then loads sector 0 of that partition and jumps to it. The code in that sector presumably loads the windows bootloader from the same (formatted) partition using a system similar to Grub’s blocklists (as the MBR code’ 512 bytes of assembly-code can’t possibly understand a filesystem). The bootloader then scans all partitions looking for one marked with the ‘boot’ flag; such partitions are expected to contain a windows kernel that the bootloader can load & execute. It is possible for the “system partition” and “boot partition” to be the same, ie for the bootloader to be stored on the same partition as the windows kernel.

    Grub can “chain load” windows by simply loading the contents of the first sector of the windows “system partition” and executing it.

  5. When the core.img file is created, it is possible to embed a config-file within core.img. If this has been done, then before the first module is dynamically loaded, the text (script) in this config file is passed to a primitive built-in script parser. This script can then configure modules needed to access the /boot/grub directory. In the simplest cases, nothing is needed (ie no config file is necessary); just the right filesystem driver needs to be linked in and the drive/partition ids are hard-wired into the core.img. However if the /boot/grub filesystem should be identified via UUID or label then a script is needed which invokes a linked-in “search” command. And if /boot/grub is on an LVM drive, or mounted over NFS or other exotic combinations then other commands can be provided to set things up appropriately. Note that only linked-in modules can be referenced from this config-file, as /boot/grub is not accessable at this point.

  6. Creating blocklists for a file is an interesting process; normal OS filesystems do not expose info on the raw disk addresses of the blocks on which a file is stored. So Grub instead uses its own filesystem modules to read the same file directly via the raw partition device, while the OS also has that partition mounted. For safety, it checks that what it gets from its filesystem driver is byte-for-byte identical to what it gets from the OS-level read call on that same file. And Grub’s filesystem modules do have an option to return blocklists.

  7. This means that the core.img file must not be moved! When the standard “embedding” mode is used for core.img this is not a problem, and this is why embedding (copying core.img into a location outside of any filesystem) is strongly recommended.

    However even when the alternative is used (a blocklist pointing to a file inside a real filesystem), there is not usually a problem; on most filesystems files do not get moved unless they are modified. Doing a “defrag” operation would be bad … but posix filesystems do not generally support (or need) defrag. Log-structured filesystems do move blocks, but only when the file is modified, and core.img should not be modified without also rewriting the MBR. Solid-state hard-drives can move blocks around internally (“wear levelling”) even when they have not been modified, but this is invisible to the outside world; they provide a mapping from logical-to-physical blocks internally and therefore this causes no problems.

    There are a few filesystems designed explicitly to write directly to flash, and these can do wear-levelling at the OS level. Presumably it would be a very bad idea to use one of these filesystems to store core.img on! However perhaps such filesystems provide a way to prevent wear-levelling per-inode, or per-partition.

  8. A “menuentry” entry in grub.cfg can be thought of as similar to a function definition. The code in a function is not executed until someone calls the function; similarly the code in a menuentry block is not executed until normal.mod decides to - ie a user has selected that menu option for booting.

    Alternatively, “menuentry” can be thought of as a “command” like “ls” or “search” which calls the associated function registered by normal.mod, with the last function parameter being a lambda expression (a reference to an anonymous function) which is the code in the {...} block. The implementation just adds this data to a list for later display (and execution if it is selected).

  9. There are currently three ways to specify the root filesystem to a Linux kernel:

    • root=MAJOR:MINOR, eg “root=8:5”.

      The minor number is just the partition#. The major is dynamically assigned, ie will change as devices are inserted into or removed from the machine.

    • root=devnumber, eg “root=0x0805”.

      The low N bits of this number are the device MINOR number, and the high bits are the MAJOR number; see above.

    • Via a path like “root=/dev/sda”.

    Note that the kernel allows this although there is no “/dev” directory yet (there is no root filesystem at all!). Linux contains some hard-coded hackery that computes a “default name” for each device it has detected, although there is no absolute guarantee that after a /dev filesystem exists the device will in fact have that name. And this hackery is based on device enumeration order, so again the path will change as devices are added/removed.

    When a kernel has an initrd, then the root= param is actually completely ignored during booting; the initrd holds the root filesystem. User-space code in the initrd eventually peeks at the kernel’s original command-line params and extracts the root= argument, then mounts this filesystem and uses pivot_root to replace itself with the “real” root filesystem. This allows tricky things like “root=UUID=” or “root=LABEL=” or using network-mounted or lvm filesystems. However without an initrd, just options 1..3 above are supported.

    Unfortunately (AFAIK) because of the above, the Grub “search” command cannot be used to figure out the root= parameter for Linux. It can correctly find the desired device by uuid/label/etc, but only returns that data in “hdx,y” form, and Linux does not accept identifiers of that form for its root= parameter. Grub cannot map an (hdx,y) into a “/dev/sd?” form because it just does not know what naming Linux will use when it enumerates the attached devices in the system; for example Linux code might get changed to enumerate buses in reverse order, thus making sda the device with the “highest” address rather than the “lowest”. And Linux cannot be enhanced to accept (hdx,y) forms because there is no 100%-reliable way of deciding what the BIOS would have called any specific physical device - and adding code to support a specific bootloader would probably be controversial.

    Using Grub’s “device.map” file to map (hdx,y) into /dev/sd? form is initially tempting, but unfortunately is not practical. The whole point of searching for the rootfs by some id (eg UUID or label) is that the hardware may get rearranged; therefore using a static text file which maps (hdx,y) to /dev/sd? is rather pointless; it will be wrong if the device order changes, or the Linux code changes device enumeration order. The (hdx,y) value will change in sync with the hardware changes, but the fixed mapping will not. In addition, this file can become “stale”; it obviously was correct at the time that Grub was installed (else Grub would not have installed correctly), but device.map is not actually used during the boot process, so might become obsolete. It may also be incomplete; device.map has to correctly identify the devices on which Grub installed its core.img and MBR, but other entries are not used, so if the kernel’s rootfs is on some other device then this may be incorrect.

    An approach that might work is to enhance Grub’s search command to optionally output some device “key”, like the PCI path to the device, rather than the (hdx,y) format. Then enhance Linux to accept a root= parameter that identifies the device by that key (eg bus path). However at the current time, neither feature exists. And “bus path” certainly would not work, as Grub does not bother enumerating devices by bus on x86; it just relies on the BIOS apis to access the drives. Similarly with “io port” or “config register address”; these are unique for each device, but Grub2 does not know these values as it just uses (hd0, sector) parameters to the BIOS.

    When using the GPT partition table format, each partition is assigned a unique identifier, dynamically generated when the partition is created. This could be the basis of a good solution; it just needs a Grub “map (hdx,y) to GUID” command, plus enhancement to Linux to accept a GPT GUID as the key for locating the corresponding device object. There probably would not be major complaints about supporting this in Linux. However you can not dual-boot any OS that does not support GPT natively (including all 32-bit Windows including windows7); you can boot non-GPT oses as virtual machines though. Another issue: a GUID is 16 bytes, which is quite a lot for a Linux kernel commandline. Hey, but root=UUID=n is used to support filesystem UUIDs now (via udev).

  10. In all PC-based systems, the BIOS firmware knows nothing about filesystems, and provides less than 512 bytes of space in the Master Boot Record (MBR) for boot code, which is far too small to implement support for a filesystem. Therefore the “first stage” of booting must be able to load the next step without filesystem support. The solution in all bootloaders for BIOS systems is to have the MBR code simply load a raw list of disk blocks by address.

    Non-PC systems often have more sophisticated firmware which can directly access some filesystem types, and therefore have no equivalent of the “stage 1” process on BIOS-based systems.