On Fri, Jan 28, 2022 at 10:39 PM Richard W.M. Jones <rjones(a)redhat.com> wrote:
[NB: I think this is a failed attempt, so shoudn't go upstream, and
doesn't need to be reviewed.]
When nbdcopy writes to an NBD server it ignores the server's
minimum/preferred block size. This hasn't caused a problem til now,
but it turns out that in at least one case we care about (writing to
qemu-nbd backed by a compressed qcow2 file) we must obey the minimum &
preferred block size of 64K.
For the longer story on that see this thread:
https://lists.nongnu.org/archive/html/qemu-devel/2022-01/threads.html#06108
This patch attempts to fix this. The uncontroversial part of this
patch adds a global "blocksize" variable which is the destination
preferred block size.
The tricky part of this patch tries to ensure that writes to the
destination obey this block size constraint.
Since nbdcopy is driven by the extent map read from the source, the
theory behind this implementation is that we read the extent map and
then try to "adjust" it so that extents which are not aligned to the
block size grow, shrink or are deleted. It proved to be very
difficult to get that right, but you can see the implementation in the
new function "adjust_extents_for_blocksize".
Unfortunately not only is this difficult to implement, but the theory
is wrong. Read requests are actually split up into smaller
subrequests on write (look for "copy_subcommand" in
multi-thread-copying.c). So this doesn't really solve the problem.
I think the theory is right - but this should be solved in 2 steps. The first
step is to adapt the extent map the minimum block size. This generates valid
read requests that are always aligned to the minimum block size.
The read replies are sparsified using --sparse (4k default), but this value
can be adjusted to the destination block size automatically - we can treat
this as a hint, or document that the value will be aligned to the destination
minimum block size.
So I think in my second version I will look at adjusting the NBD
destination driver (nbd-ops.c) directly so that it turns unaligned
writes into buffered read/modify/write operations (which can be
optimized quite a lot because we understand the write pattern and know
that the main program doesn't go backwards within blocks).
Doing read-modify-write in nbdcopy sounds like a bad idea. This is best done
closer to the storage, avoiding reading blocks from qemu-nbd to nbdcopy just
to write them back.
Nir