As you might have seen for the past 3 days I've been tackling a nasty
data corruption bug[1][2].
The bug occurs when ALL of the following conditions are true:
(a) You are using a qcow2 image file.
(b) You are writing out data to the image file using libguestfs or a
libguestfs-using tool like guestfish or virt-resize.
(c) The data is not being written to a filesystem (to files or
directories) but is being written directly to a block device
within libguestfs, eg. updates to the partition table or writes
directly to /dev/sdaX.
(d) You are using qemu < 1.1.0.
When the guestfs handle is closed, the data might not be written to
the qcow2 image file. This data loss, if it happens, is silent.
This peculiar combination of factors happened to occur in the
virt-resize test program[3], and this was where I first spotted it[4]
although at first it didn't look like a data corruption bug at all.
After analysis I found that there are four separate bugs involved:
(i) qemu had a bug where it would segfault when you sent it a
SIGTERM signal. It turned out that where qemu was writing to a
qcow2 file, and the qcow2 writeback cache is enabled [NB:
cache=none enables this cache], and write requests were in
flight at the point when the SIGERM is received, it would crash.
** This bug has been fixed in qemu/qemu-kvm >= 1.1.0. It is highly
** recommended that you immediately upgrade to this version, not
** just for libguestfs but for all usage.
(ii) The Linux kernel sync(2) system call doesn't issue a write
barrier for dirty blocks that are written to a block device
directly, only for mounted filesystems.
This bug will probably be fixed if the following patch goes
upstream:
https://lkml.org/lkml/2012/7/3/277
(iii) libguestfs was issuing sync(2) in the expectation that it
flushed everything.
The implication for libguestfs is that the qemu cache still
contains data at the point when we kill qemu. Bugs (i) and (ii)
unexpectedly interact.
(iv) libguestfs didn't check the return value for waitpid(2) so it
didn't know that qemu was segfaulting, so this loss of data was
silently ignored.
Bug (i) can be fixed by updating to qemu 1.1.0. Unfortunately we do
not know which precise commit between 1.0 and 1.1.0 fixed the bug, and
doing a git bisect is difficult because the data corruption bug is
very hard to reproduce reliably.
Bugs (iii) and (iv) will be fixed by forthcoming patches to libguestfs
= 1.19.16 which will be backported to 1.16 and 1.18 branches. Note
that this requires a new API, guestfs_shutdown[5]. If your program
wants to handle write errors correctly it will need to use this new
API, otherwise an error will be printed and ignored. All libguestfs
tools that modify disk images have been updated to use the new API.
Hans de Goede is currently updating Fedora to qemu-kvm 1.1.0.
Versions of libguestfs which contain fixes will be announced
separately. It is likely that these versions will *require* qemu >= 1.1.0,
so effectively our baseline version of qemu has just increased from
1.0 to 1.1.0, and this change is noted in the README file.
(Thanks to Kevin Wolf, Paolo Bonzini, Avi Kivity, Padraig Brady for
invaluable help.)
Rich.
[1]
https://www.redhat.com/archives/libguestfs/2012-July/msg00005.html
[2]
https://www.redhat.com/archives/libguestfs/2012-July/msg00008.html
[3]
https://github.com/libguestfs/libguestfs/blob/cb24ceedd8a8ef7da71cfcce6db...
[4]
https://bugzilla.redhat.com/show_bug.cgi?id=836710
[5]
https://www.redhat.com/archives/libguestfs/2012-July/msg00014.html
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://et.redhat.com/~rjones/virt-df/