On Wed, Sep 13, 2017 at 09:47:52AM +0100, Richard W.M. Jones wrote:
On Wed, Sep 13, 2017 at 11:25:32AM +0300, Roman Kagan wrote:
> On Tue, Sep 12, 2017 at 06:04:18PM +0100, Richard W.M. Jones wrote:
> > v2 -> v3:
> >
> > - I addressed everything that Pino mentioned last time.
> >
> > - It's tricky to get a stable run when multiple copies of qemu are
> > involved, because the same cache files get overwritten by parallel
> > libguestfs. So I changed the names of the cache files to include
> > the qemu binary key (size, mtime), which removes this conflict.
> > This is in new patch 4/6.
>
> Sorry I must have missed the motivation part so could you please remind
> why locking needed to be turned off?
qemu 2.10 implements mandatory file locking, so if two instances of
qemu open the same file for writing you'll get an error:
[...]
However the exclusive lock prevents libguestfs from *reading* from
an
open disk image, eg. to do virt-df to monitor space or virt-tail to
monitor log files. Such disk images are opened using the
guestfs_add_drive_opts readonly=true flag, which is implemented by
placing a qcow2 overlay on top of the disk image, the purpose of the
overlay being to take any writes and protect the underlying disk image
from being modified.
This kind of access is safe[1]. However qemu mandatory locking
prevents it by trying to acquire a lock on the backing file (because
in general terms qemu could write to the backing file eg if you
committed a snapshot, although libguestfs never does this).
So in the case where readonly=true, this patch series uses
file.locking=off to turn off locking on the backing file.
[...]
[1] Safe from the point of view that it won't ever modify the
disk
image. It's not guaranteed that libguestfs won't see strange
corruption, so users of this have to be prepared to retry operations
if they see errors.
Yes that's exactly the scenario that's bothering me. The error message
about the image being used by another process is a pretty clear one, and
indicates exactly what the problem is and what to do to avoid it.
When libguestfs sees "strange corruption" it has no way of telling the
user what the actual problem is. This will likely make users report
issues with libguestfs and descendants failing to do their job for no
apparent reason. And no, I don't think "you'll have to be prepared to
retry" will work for them. How to tell a transient problem from a
permanent one? When to retry? How many times before giving up?
I think one can consider providing an option to forcefully disable
locking for power users who know what they're doing. But ignoring locks
universally doesn't look like a good idea to me.
Roman.