Re: [Libguestfs] [PATCH libnbd] copy: Implement destination preferred block size

Monday, 31 January 2022

On 01/28/22 21:36, Richard W.M. Jones wrote:
...
 [NB: I think this is a failed attempt, so shoudn't go upstream,
and
 doesn't need to be reviewed.]

 When nbdcopy writes to an NBD server it ignores the server's
 minimum/preferred block size.  This hasn't caused a problem til now,
 but it turns out that in at least one case we care about (writing to
 qemu-nbd backed by a compressed qcow2 file) we must obey the minimum &
 preferred block size of 64K.

 For the longer story on that see this thread:
 https://lists.nongnu.org/archive/html/qemu-devel/2022-01/threads.html#06108

 This patch attempts to fix this.  The uncontroversial part of this
 patch adds a global "blocksize" variable which is the destination
 preferred block size.

 The tricky part of this patch tries to ensure that writes to the
 destination obey this block size constraint.

 Since nbdcopy is driven by the extent map read from the source, the
 theory behind this implementation is that we read the extent map and
 then try to "adjust" it so that extents which are not aligned to the
 block size grow, shrink or are deleted.  It proved to be very
 difficult to get that right, but you can see the implementation in the
 new function "adjust_extents_for_blocksize".

 Unfortunately not only is this difficult to implement, but the theory
 is wrong.  Read requests are actually split up into smaller
 subrequests on write (look for "copy_subcommand" in
 multi-thread-copying.c).  So this doesn't really solve the problem.

 So I think in my second version I will look at adjusting the NBD
 destination driver (nbd-ops.c) directly so that it turns unaligned
 writes into buffered read/modify/write operations (which can be
 optimized quite a lot because we understand the write pattern and know
 that the main program doesn't go backwards within blocks). 
This seems very similar to a problem I had faced a decade ago, in the
parallel decompressor of lbzip2. The core of the idea is this: use a
priority queue data structure (there are multiple data structures good
for that; a red-black tree is one). The key (for sorting; aka the
"priority") is the offset at which the block exists. Additionally,
maintain the "next offset to write" in a variable (basically the file
size that has been written out thus far, contiguously).

Whenever a new block arrives, do the following:

- place it in the priority queue, using its offset as key

- if the offset of the "front block" of the priority queue equals the
"next offset to write": loop through the initial (= front) contiguous
sequence of blocks in the queue, and every time the offset range reaches
or exceeds the output block size, write out (and pop) the affected full
blocks, and update "next offset to write" as well. Any incompletely
written out input block (= "tail block") will remain having the lowest
key (=offset) in the queue, so its key (= where to start writing out the
tail the next time) can be adjusted in-place (it will remain at the front).

If we're really sure that input blocks will never arrive out-of-order,
then a simple FIFO should work (with the same logic); priorities are not
needed, just a check (abort) in case an input block arrives either
out-of-order (going backwards, or going forward, creating a gap).

With this in mind I'm unsure if RMW is needed; just buffered writes
should suffice I think.

Thanks
Laszlo

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] [PATCH libnbd] copy: Implement destination preferred block size