On Wed, Sep 13, 2017 at 12:40:29PM +0300, Roman Kagan wrote:
On Wed, Sep 13, 2017 at 09:47:52AM +0100, Richard W.M. Jones wrote:
> On Wed, Sep 13, 2017 at 11:25:32AM +0300, Roman Kagan wrote:
> > On Tue, Sep 12, 2017 at 06:04:18PM +0100, Richard W.M. Jones wrote:
> > > v2 -> v3:
> > >
> > > - I addressed everything that Pino mentioned last time.
> > >
> > > - It's tricky to get a stable run when multiple copies of qemu are
> > > involved, because the same cache files get overwritten by parallel
> > > libguestfs. So I changed the names of the cache files to include
> > > the qemu binary key (size, mtime), which removes this conflict.
> > > This is in new patch 4/6.
> >
> > Sorry I must have missed the motivation part so could you please remind
> > why locking needed to be turned off?
>
> qemu 2.10 implements mandatory file locking, so if two instances of
> qemu open the same file for writing you'll get an error:
[...]
> However the exclusive lock prevents libguestfs from *reading* from an
> open disk image, eg. to do virt-df to monitor space or virt-tail to
> monitor log files. Such disk images are opened using the
> guestfs_add_drive_opts readonly=true flag, which is implemented by
> placing a qcow2 overlay on top of the disk image, the purpose of the
> overlay being to take any writes and protect the underlying disk image
> from being modified.
>
> This kind of access is safe[1]. However qemu mandatory locking
> prevents it by trying to acquire a lock on the backing file (because
> in general terms qemu could write to the backing file eg if you
> committed a snapshot, although libguestfs never does this).
>
> So in the case where readonly=true, this patch series uses
> file.locking=off to turn off locking on the backing file.
[...]
> [1] Safe from the point of view that it won't ever modify the disk
> image. It's not guaranteed that libguestfs won't see strange
> corruption, so users of this have to be prepared to retry operations
> if they see errors.
Yes that's exactly the scenario that's bothering me. The error message
about the image being used by another process is a pretty clear one, and
indicates exactly what the problem is and what to do to avoid it.
When libguestfs sees "strange corruption" it has no way of telling the
user what the actual problem is. This will likely make users report
issues with libguestfs and descendants failing to do their job for no
apparent reason. And no, I don't think "you'll have to be prepared to
retry" will work for them. How to tell a transient problem from a
permanent one? When to retry? How many times before giving up?
I think one can consider providing an option to forcefully disable
locking for power users who know what they're doing. But ignoring locks
universally doesn't look like a good idea to me.
We might consider the opposite (having a "don't ignore the locks"
flag), but anything else is an ABI break. Like it or not, this is how
readonly mode has worked in libguestfs since the beginning.
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/