On Wed, May 31, 2023 at 01:29:30PM +0200, Laszlo Ersek wrote:
>>> Putting aside alignment even, I don't understand why reducing
"count" to
>>> uint16_t would be reasonable. With the current 32-bit-only block
>>> descriptor, we already need to write loops in libnbd clients, because we
>>> can't cover the entire remote image in one API call [*]. If I understood
>>> Eric right earlier, the 64-bit extensions were supposed to remedy that
>>> -- but as it stands, clients will still need loops ("chunking")
around
>>> block status fetching; is that right?
>>
>> While the larger extents reduce the need for looping, it does not
>> entirely eliminate it. For example, just because the server can now
>> tell you that an image is entirely data in just one reply does not
>> mean that it will actually do so - qemu in particular limits block
>> status of a qcow2 file to reporting just one cluster at a time for
>> consistency reasons, where even if you use the maximum size of 2M
>> clusters, you can never get more than (2M/16)*2M = 256G status
>> reported in a single request.
>
> I don't understand the calculation. I can imagine the following
> interpretation:
>
> - QEMU never sends more than 128K block descriptors, and each descriptor
> covers one 2MB sized cluster --> 256 GB of the disk covered in one go.
>
> But I don't understand where the (2M/16) division comes from, even
> though the quotient is 128K.
Ah, I need to provide more backstory on the qcow2 format. A qcow2
image has a fixed cluster size, chosen between between 512 and 2M
bytes. A smaller cluster size has less wasted space for small images,
but uses more overhead. Each cluster has to be stored in an L1 map,
where pages of the map are also a cluster in length, with 16 bytes per
map entry. So if you pick a cluster size of 512, you get 512/16 or 32
entries per L1 page; if you pick a cluster size of 2M, you get 2M/16
or 128k entries per L1 page. When reporting block status, qemu reads
at most one L1 page to then say how each cluster referenced from that
page is mapped.
https://gitlab.com/qemu-project/qemu/-/blob/master/docs/interop/qcow2.txt...
>
> I can connect the constant "128K", and
> <
https://github.com/NetworkBlockDevice/nbd/commit/926a51df>, to your
> paragraph [*] above, but not the division.
In this case, the qemu limit on reporting block status of at most one
L1 map page at a time happens to have no relationship to the NBD
constant of limiting block status reports to no more than 1M extents
(8M bytes) in a single reply, nor the fact that qemu picked a cap of
1M bytes (128k extents) on its NBD reply regardless of whether the
underlying image is qcow2 or some other format.