On Wed, Jun 9, 2021 at 9:01 PM Eric Blake <eblake(a)redhat.com> wrote:
When trying to reconstruct a qcow2 chain using information provided
over NBD, ovirt had been relying on an unsafe assumption that any
portion of the qcow2 file advertised as sparse would defer to the
backing image; this worked with what qemu 5.2 reports for a qcow2 BSD
loaded with "backing":null. However, in 6.0, commit 0da9856851 (nbd:
server: Report holes for raw images) also had a side-effect of
reporting unallocated zero clusters in qcow2 files as sparse. This
change is correct from the NBD spec perspective (advertising bits has
always been optional based on how much information the server has
available, and should only be used to optimize behavior when a bit is
set, while not assuming semantics merely because a bit is clear), but
means that a qcow2 file that uses an unallocated zero cluster to
override a backing file now shows up as sparse over NBD, and causes
ovirt to fail to reproduce that cluster (ie. ovirt was assuming it
only had to write clusters where the bit was clear, and the 6.0
behavior change shows the flaw in that assumption).
The correct fix is for ovirt to additionally use the
qemu:allocation-depth metadata context added in 5.2: after all, the
actual determination for what is needed to recreate a qcow2 file is
not whether a cluster is sparse, but whether the allocation-depth
shows the cluster to be local. But reproducing an image is more
efficient when handling known-zero clusters, which means that ovirt
has to track both base:allocation and qemu:allocation-depth metadata
contexts simultaneously. While NBD_CMD_BLOCK_STATUS is just fine
sending back information for two contexts in parallel, it comes with
some bookkeeping overhead at the client side: the two contexts need
not report the same length of replies, and it involves more network
traffic.
Since this change is not simple, and the chance that we also get the dirty
bitmap included in the result seems to be very low, I decided to check the
direction of merging multiple extents.
I started with merging "base:allocation" and "qemu:dirty-bitmap:xxx"
since
we already have both. It was not hard to do, although it is not completely
tested yet.
Here is the merging code:
https://gerrit.ovirt.org/c/ovirt-imageio/+/115216/1/daemon/ovirt_imageio/...
To make merging easy and safe, we map the NBD_STATE_DIRTY bit to a private bit
so it cannot clash with the NBD_STATE_HOLE bit:
https://gerrit.ovirt.org/c/ovirt-imageio/+/115215/1/daemon/ovirt_imageio/...
Here is a functional test using qemu-nbd showing that it works:
https://gerrit.ovirt.org/c/ovirt-imageio/+/115216/1/daemon/test/client_te...
I'll try to use "qemu:allocation-depth" in a similar way next week,
probably
mapping depth > 0 to EXTENT_EXISTS, to use when reporting holes in
single qcow2 images.
If this is successful, we can start using this in the next ovirt release, and we
don't need "qemu:joint-allocation".
Nir