Re: [Libguestfs] [PATCH 0/7 v2] Fix and workaround for qcow2 issues in qemu causing data corruption.

Wednesday, 4 July 2012

On 07/03/2012 07:03 PM, Richard W.M. Jones wrote:
...
 https://bugzilla.redhat.com/show_bug.cgi?id=836710
 https://bugzilla.redhat.com/show_bug.cgi?id=836913

 There are at least two related bugs going on:

 (1) Linux sync(2) system call doesn't send a write barrier to the
 disk, so in effect it doesn't force the hard disk to flush its cache.
 libguestfs used sync(2) to force changes to disk. 
Surprising. So sync(2) is currently async. Ho hum.
I just noticed Jan Kara's patch set today actually:
https://lkml.org/lkml/2012/7/3/272
Would fix the issue at the kernel level?

...
  We didn't expect
 that qemu was caching anything because we used 'cache=none' for all
 writable disks, but it turns out that qemu creates a writeback cache
 anyway when you do this (you need to use 'cache=directsync' when you
 don't want a cache at all). 
And we're not using 'directsync' for performance reasons?

...
 (2) qemu's qcow2 disk cache code is buggy.  If there are I/Os in
 flight when qemu shuts down, then qemu segfaults or assert fails.
 This can result in unwritten data.  Unfortunately libguestfs ignored
 the result of waitpid(2) so we didn't see this problem happening.

 Patch 1/7 fixes the first problem by issuing fsync(2) on each whole
 block device when we sync.

 Patches 2/7 - 7/7 are needed to fix the second problem.  We add a new
 API (guestfs_shutdown) so that we can actually catch the case where
 qemu is segfaulting instead of just ignoring it.  Since qemu itself
 isn't likely to be fixed any time soon, patch 7/7 adds a crude but
 effective workaround to virt-resize. 
thanks for looking into this tricky issue so thoroughly,
Pádraig.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] [PATCH 0/7 v2] Fix and workaround for qcow2 issues in qemu causing data corruption.