On 04/10/2018 09:40 AM, Richard W.M. Jones wrote:
> When the destination is a block device we cannot avoid zeroing
since a block
> device may contain junk data (we usually get dirty empty images from our
> local
> xtremio server).
(Off topic for qemu-block but ...) We don't have enough information
at our end to know about any of this.
Yep, see my other email about a possible NBD protocol extension to
actually let the client learn up-front if the exported device is known
to start in an all-zero state.
>> The problem is that the NBD block driver has max_pwrite_zeroes = 32 MB,
>> so it's not that efficient after all. I'm not sure if there is a real
>> reason for this, but Eric should know.
>>
>
> We support zero with unlimited size without sending any payload to oVirt,
> so
> there is no reason to limit zero request by max_pwrite_zeros. This limit may
> make sense when zero is emulated using pwrite.
Yes, this seems wrong, but I'd want Eric to comment.
The 32M cap is currently the fault of qemu-img, not nbdkit (nbdkit is
not further reducing the size of the zero requests it passes on to
oVirt); and I explained in the other email about how qemu 2.13 will fix
things to send larger zero requests (hmm, that means nbdkit really needs
to start supporting NBD_OPT_GO, as that is what qemu will be relying on
to learn the larger limits).
>>> However, since you suggest that we could use "trim" request for
these
>>> requests, it means that these requests are advisory (since trim is), and
>>> we can just ignore them if the server does not support trim.
>>
>> What qemu-img sends shouldn't be a NBD_CMD_TRIM request (which is indeed
>> advisory), but a NBD_CMD_WRITE_ZEROES request. qemu-img relies on the
>> image actually being zeroed after this.
>>
>
> So it seems that may_trim=1 is wrong, since trim cannot replace zero.
Note that the current plugin ignores may_trim. It is not used at all,
so it's not relevant to this problem.
However this flag actually corresponds to the inverse of
NBD_CMD_FLAG_NO_HOLE which is defined by the NBD spec as:
bit 1, NBD_CMD_FLAG_NO_HOLE; valid during
NBD_CMD_WRITE_ZEROES. SHOULD be set to 1 if the client wants to
ensure that the server does not create a hole. The client MAY send
NBD_CMD_FLAG_NO_HOLE even if NBD_FLAG_SEND_TRIM was not set in the
transmission flags field. The server MUST support the use of this
flag if it advertises NBD_FLAG_SEND_WRITE_ZEROES. *
qemu-img convert uses NBD_CMD_WRITE_ZEROES and does NOT set this flag
(hence in the plugin we see may_trim=1), and I believe that qemu-img
is correct because it doesn't want to force preallocation.
Yes, the flag usage is correct, and you are also correct that the
'may_trim' flag of nbdkit is the inverse bit sense of the
NBD_CMD_FLAG_NO_HOLE of the NBD protocol; it's all a documentation game
in deciding whether having a bit be 0 or 1 in the default state made
more sense.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization:
qemu.org |
libvirt.org