Bisecting a Linux Kernel

Categories: Linux

Intro

When trying to track down a Linux Kernel issue, it is often necessary to repeatedly build the kernel, install it, boot it, and test (and using git-bisect to find a kernel issue is an extreme case).

There are some pages on the internet that describe how to do this, but they generally describe how to use the distribution packaging tools to build and install the kernel (eg fakeroot + dpkg + update-initramfs + update-grub). This is totally pointless, as it clutters the package-manager with garbage, and scatters unnecessary files everywhere - the kernel being built is a temporary one, not one intended for regular use, and so should be built as quick as possible, and installed in a way that makes removing it again as easy as possible.

Simply running “make install_modules; make install” in the linux build directory is a major improvement. It assumes the existence of an /sbin/installkernel script, which most distributions do provide. This still does too much though (for example, on Ubuntu, /sbin/installkernel runs all scripts in /etc/kernel/postinst.d which then triggers update-grub).

Below is my approach to installing temporary kernels.

WARNING: a kernel installed this way is not suitable for long-term use. For example, the kernel headers don’t get installed, so building other userspace applications against may not see the right headers.

Note: I’m currently running Ubuntu 12.04 on x86. The commands needed for your system might differ slightly (eg here it is assumed that ‘sudo’ works).

Setting up Grub

I create two permanent grub entries for booting a kernel named kimage-test. The actual kernel image is then overwritten during the testing, but the grub entry does not need to be updated - ie update-grub does NOT need to be run after each kernel build.

First create a new grub config template file:

cd /etc/grub.d
sudo -e 50_test

Now paste the following into the new file. The exact contents of the custom grub entries needs to match your setup, so you’ll have to tweak the example below. Look at a typical entry in your /boot/grub/grub.cfg and copy its settings as appropriate.

cat << EOF

menuentry 'Test - noinitrd' {
    recordfail
    gfxmode $linux_gfx_mode
    insmod gzio
    insmod part_msdos
    insmod ext2
    set root='(hd0,msdos6)'
    linux /boot/kimage-test root=/dev/sdaXXXXX ro splash
}

menuentry 'Test - initrd' {
    recordfail
    gfxmode $linux_gfx_mode
    insmod gzio
    insmod part_msdos
    insmod ext2
    set root='(hd0,msdos6)'
    search --no-floppy --fs-uuid --set=root XXXXXX
    linux   /boot/kimage-test root=UUID=XXXXX ro splash
    initrd  /boot/ramdisk-test
}

EOF

As noted above, the exact contents of the menuentry sections depend on your machine; in particular you’ll need to replace the XXXXX parts with appropriate values.

Note that the “noinitrd” version has a device node as the root filesystem, not a filesystem UUID. This is necessary because the kernel does not natively support specifying the root filesystem via UUID - it relies on the initrd to map this, and in this case we have no initrd.

The kernel image file is named kimage-* rather than the traditional vmlinuz-* in order to avoid being auto-detected by the standard update-grub script. Similarly, ramdisk-* is used instead of initrd.img-*. The non-standard names also are useful for identifying which files were installed by this approach.

Make sure the new file is marked executable..

sudo chmod a+x /etc/grub.d/50_test

And recreate /boot/grub/grub.cfg

sudo update-grub

Installing a kernel with static modules

In this approach, all modules needed by the kernel are simply compiled into the kernel. There is then no need for an initrd file, and no need for a /lib/modules/{version} directory. And there is very little to “clean up” afterwards - just remove the kernel image from the boot directory.

Unfortunately:

  • some code cannot be compiled into the kernel.
  • some modules act differently when compiled into the kernel 1
  • modprobe complains if the /lib/modules/{version} directory doesn’t exist, even if it isn’t needed.

Still, avoiding modules is great when possible. So try it first…

Make sure that all devices you intend to use in the custom kernel are plugged in (eg mouse, webcam), then:

cd ...
make dist-clean
make localyesconfig

And now actually start the compilation:

make EXTRAVERSION=-test all

Hopefully all build ok, and no modules were produced (or at least that you’ll actually need at runtime):

find . -name "*.ko"

Now just copy the kernel:

sudo cp arch/x86/boot/bzImage /boot/kimage-test

Reboot and test…

Of course, you’ll need to select the “test - noinitrd” option from grub after reboot.

When building further kernels, just repeat from the “make EXTRAVERSION=-test all” step.

Note: localyesconfig was broken around linux 3.5, and only fixed in linux 3.7.0-rc1. If you start your bisect in this area (ie have something in this range checked out when you run “make localyesconfig”) then you will effectively just get localmodconfig. A workaround is to just replace scripts/kconfig/streamline_config.pl with the version from HEAD.

Installing a kernel with loadable modules

In this approach, things are much more like a distribution kernel, where most kernel modules are stored as .ko files in /lib/modules/{version}, and are loaded by the modprobe tools when needed. The disadvantage of this approach is that you’ll need to manually clean up the /lib/modules/{version} directory (if you care) after testing is complete.

First, unpack an existing initrd image into a working directory:

VER=`uname -r`
INITRD=$(HOME)/Linux/kernel/initrd
mkdir -p $INITRD
cd $INITRD
gunzip --stdout < /boot/initrd.img-$VER | cpio -i

Now go to the kernel source root dir, and create a .config file that compiles the minimum number of modules. Make sure that all devices you intend to use in the custom kernel are plugged in (eg mouse, webcam), then:

cd ...
make dist-clean
make localmodconfig

Now you can build your kernel:

make EXTRAVERSION=-test all

Setting the EXTRAVERSION property overrides the one in the Makefile; so instead of a kernel that thinks it is something like ‘3.6.0-rc5+’ it instead thinks it is ‘3.6.0-test+’. While this loses a little useful info, it means that the kernel modules installed into dir /lib/modules/{version} have this useful “-test” suffix and so won’t overwrite any files that are being used by other non-temporary kernels. It also makes it clear which dirs can be deleted when you get around to cleaning up the /lib/modules dir after testing is complete.

After each compilation is finished, just run the following script (with BUILD and INITRD variables modified as appropriate):

BUILD=$(HOME)/Linux/kernel/linus
INITRD=$(HOME)/Linux/kernel/initrd

cd $BUILD || exit -1
VER=`cat include/config/kernel.release`

# install the kernel image into /boot/kimage-test
sudo cp arch/x86/boot/bzImage /boot/kimage-test || exit -1

# install the kernel modules into /lib/modules/{version}
sudo make modules_install || exit -1

# install the kernel modules into initrd working directory
rm -rf $INITRD/lib/modules/*
INSTALL_MOD_PATH=$INITRD make modules_install || exit -1

# package the initrd temporary directory into /boot/ramdisk-test
cd $INITRD || exit -1
find . | cpio -o --format=newc | gzip | sudo dd of=/boot/ramdisk-test

echo "Installed kernel version $VER!\n"

Other notes

While writing this article, I found that the “Linux Kernel In A Nutshell” book (published 2006) recommends something very similar (see chapter 5, “Installing and Booting from a Kernel”).

Footnotes

  1. For example, modules which load firmware into a device don’t work well when compiled in, because firmware-loading is implemented by sending a netlink event to the userspace udev process [for kernels prior to v3.7]. For compiled-in modules, however, initialisation is run before userspace is configured. The result is fairly obvious..