This Data Corruption Bug will Shock You
It's not common to have a severe bug in Ubuntu LTS, and it's even less common to have a severe data corruption bug! We are used to the idea that data is stored reliably and safely on our computers. However, for the past week, I have been pulling my hair out because of a data corruption issue, and was really surprised to discover the true cause. I have been attempting to create several Virtual Machines (VMs) to compartmentalize my essential network services, e.g. email is separated from contacts, and I decided to use virtualized QEMU instances on my Ubuntu (18.04 LTS) system. I first provisioned a Windows Server VM, which installed flawlessly on a qcow2 image on an ext4 disk. However, I ran into issues when provisioning a Linux VM with a different virtual hard drive configuration. The drive format and definition (libvirt) for the VM was:
<disk type='file' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source file='/dev/rust/client'/> <target dev='sda' bus='sata'/> </disk>
This disk image was stored on a ZVOL, i.e. a block device on a ZFS filesystem, and not as a file.
After installing Debian with disk encryption, and only with a long encryption key, I kept getting a kernel panic upon first boot:
[ 0.799754] Unpacking initramfs... [ 0.800970] Initramfs unpacking failed: junk in compressed archive ... [ 0.936387] List of all partitions: [ 0.937590] No filesystem could mount root, tried: [ 0.938747] [ 0.939199] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) [ 0.941199] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.0-7-amd64 #1 Debian 4.9.110-3+deb9u2 [ 0.943290] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 ... [ 0.966656] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
The console line,
Initramfs unpacking failed: junk in compressed archive, is the kernel saying that its boot file is corrupt! Initramfs is an image file that provides essential modules and drivers during system boot. I decided to mount the guest drive and extract the contents of
initrd.img, but I received more error messages from gzip:
gzip: initrd.img-4.9.0-7-amd64: invalid compressed data--crc error gzip: initrd.img-4.9.0-7-amd64: invalid compressed data--length error
At this point I gave up, thinking that the root cause was either of the following:
- a ZVOL/ZFS File System Bug (my first time using ZVOLs)
- a QEMU bug (suspicious update right before I used it)
- an initramfs-tools bug (I had a corrupt
- a LUKS disk encryption bug (only happens to encrypted disks with long encryption keys!)
I even contemplated submitting a bug report to the initramfs-tools package. However, it turns out I was completely mistaken with my guesses, and that having the bug only occur with long encryption keys was a red herring. A Google search for "qemu corrupt" with results from to the past month revealed a news article detailing the cause of the data corruption bug.
It turns out that my data corruption was due to a kernel bug! This bug only occurs when the virtual drive is set to
cache='none' and located on a non-ext4 (mine was ZFS) file system. Like in any accident, there was a long chain of events: QEMU cache='none' -> buggy system call on a ZVOL/ZFS file system -> corrupt disk write -> corrupt initramfs -> kernel panic on VM boot.
Even though I hoped that Linux (and by extension Linus!) would be infallible, and that such a serious data corruption bug would have been a package's fault, this bug is just one of many kernel bugs every year. This is the Ubuntu Bug Fix Report. The bug fix was actually released today and thankfully I did not lose any important data. In conclusion, as silly as it sounds, I learned that Linux isn't perfect.