NOTICE: Data corruption bug when writing to qcow2 files
by Richard W.M. Jones
As you might have seen for the past 3 days I've been tackling a nasty
data corruption bug[1][2].
The bug occurs when ALL of the following conditions are true:
(a) You are using a qcow2 image file.
(b) You are writing out data to the image file using libguestfs or a
libguestfs-using tool like guestfish or virt-resize.
(c) The data is not being written to a filesystem (to files or
directories) but is being written directly to a block device
within libguestfs, eg. updates to the partition table or writes
directly to /dev/sdaX.
(d) You are using qemu < 1.1.0.
When the guestfs handle is closed, the data might not be written to
the qcow2 image file. This data loss, if it happens, is silent.
This peculiar combination of factors happened to occur in the
virt-resize test program[3], and this was where I first spotted it[4]
although at first it didn't look like a data corruption bug at all.
After analysis I found that there are four separate bugs involved:
(i) qemu had a bug where it would segfault when you sent it a
SIGTERM signal. It turned out that where qemu was writing to a
qcow2 file, and the qcow2 writeback cache is enabled [NB:
cache=none enables this cache], and write requests were in
flight at the point when the SIGERM is received, it would crash.
** This bug has been fixed in qemu/qemu-kvm >= 1.1.0. It is highly
** recommended that you immediately upgrade to this version, not
** just for libguestfs but for all usage.
(ii) The Linux kernel sync(2) system call doesn't issue a write
barrier for dirty blocks that are written to a block device
directly, only for mounted filesystems.
This bug will probably be fixed if the following patch goes
upstream: https://lkml.org/lkml/2012/7/3/277
(iii) libguestfs was issuing sync(2) in the expectation that it
flushed everything.
The implication for libguestfs is that the qemu cache still
contains data at the point when we kill qemu. Bugs (i) and (ii)
unexpectedly interact.
(iv) libguestfs didn't check the return value for waitpid(2) so it
didn't know that qemu was segfaulting, so this loss of data was
silently ignored.
Bug (i) can be fixed by updating to qemu 1.1.0. Unfortunately we do
not know which precise commit between 1.0 and 1.1.0 fixed the bug, and
doing a git bisect is difficult because the data corruption bug is
very hard to reproduce reliably.
Bugs (iii) and (iv) will be fixed by forthcoming patches to libguestfs
>= 1.19.16 which will be backported to 1.16 and 1.18 branches. Note
that this requires a new API, guestfs_shutdown[5]. If your program
wants to handle write errors correctly it will need to use this new
API, otherwise an error will be printed and ignored. All libguestfs
tools that modify disk images have been updated to use the new API.
Hans de Goede is currently updating Fedora to qemu-kvm 1.1.0.
Versions of libguestfs which contain fixes will be announced
separately. It is likely that these versions will *require* qemu >= 1.1.0,
so effectively our baseline version of qemu has just increased from
1.0 to 1.1.0, and this change is noted in the README file.
(Thanks to Kevin Wolf, Paolo Bonzini, Avi Kivity, Padraig Brady for
invaluable help.)
Rich.
[1] https://www.redhat.com/archives/libguestfs/2012-July/msg00005.html
[2] https://www.redhat.com/archives/libguestfs/2012-July/msg00008.html
[3] https://github.com/libguestfs/libguestfs/blob/cb24ceedd8a8ef7da71cfcce6db...
[4] https://bugzilla.redhat.com/show_bug.cgi?id=836710
[5] https://www.redhat.com/archives/libguestfs/2012-July/msg00014.html
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://et.redhat.com/~rjones/virt-df/
12 years, 6 months
[PATCH 0/7 v2] Fix and workaround for qcow2 issues in qemu causing data corruption.
by Richard W.M. Jones
https://bugzilla.redhat.com/show_bug.cgi?id=836710
https://bugzilla.redhat.com/show_bug.cgi?id=836913
There are at least two related bugs going on:
(1) Linux sync(2) system call doesn't send a write barrier to the
disk, so in effect it doesn't force the hard disk to flush its cache.
libguestfs used sync(2) to force changes to disk. We didn't expect
that qemu was caching anything because we used 'cache=none' for all
writable disks, but it turns out that qemu creates a writeback cache
anyway when you do this (you need to use 'cache=directsync' when you
don't want a cache at all).
(2) qemu's qcow2 disk cache code is buggy. If there are I/Os in
flight when qemu shuts down, then qemu segfaults or assert fails.
This can result in unwritten data. Unfortunately libguestfs ignored
the result of waitpid(2) so we didn't see this problem happening.
Patch 1/7 fixes the first problem by issuing fsync(2) on each whole
block device when we sync.
Patches 2/7 - 7/7 are needed to fix the second problem. We add a new
API (guestfs_shutdown) so that we can actually catch the case where
qemu is segfaulting instead of just ignoring it. Since qemu itself
isn't likely to be fixed any time soon, patch 7/7 adds a crude but
effective workaround to virt-resize.
Rich.
12 years, 6 months
unexpected end of file mounting vdi image
by B. M.
I am unable to mount a vdi image with virt-df. I get this error in the libvirt log. 2012-06-24 02:37:22.484+0000: 2364: error : qemuMonitorIO:583 : internal error End of file from monitor
I don't really know from this output what the problem is and some assistance would be appreciated. I'm attaching the output of the libguest test program.
Brian
____________________________________________________________
GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at http://www.inbox.com/smileys
Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most webmails
12 years, 6 months
New tool proposal
by Wanlong Gao
Hi Rich,
We just talked about a new tool virt-diff which can diff files
between two guest, suggested by Kamezawa Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com>
things like below,
# virt-diff -Nru --guest=Guest0 --guest=Guest1 /etc/hosts
Any thought about this?
Another question, what do you think about implement iptables,
firewall, yum, rpm, etc into libguestfs?
Thanks,
Wanlong Gao
12 years, 6 months