On 03/12/2018 07:13 AM, Nir Soffer wrote:
On Mon, Mar 12, 2018 at 12:32 PM Richard W.M. Jones
<rjones(a)redhat.com>
wrote:
> On Mon, Mar 12, 2018 at 07:13:52AM +0000, Nir Soffer wrote:
>> On Fri, Mar 9, 2018 at 4:25 PM Richard W.M. Jones <rjones(a)redhat.com>
> wrote:
>>
>>> It has to be said it would be really convenient to have a 'zero'
>>> and/or 'trim' method of some sort.
>>>
>>
>> 'trim' means discard?
>
> Yes. The 5 functions we could support are:
>
> * pread - done
> * pwrite - done
> * flush - does fdatasync(2) on the block device
>
Currently we do fsync() on every PUT request, so flush is not very
useful.
> * zero - write a range of zeroes without having to send zeroes
> * trim - punch hole, can be emulated using zero if not possile
>
trim is advisory in NBD, so it can also be emulated as a no-op while
still having correct semantics. If you want to guarantee reading back
zeroes after punching a hole, you have to use zero instead of trim.
> Also (not implemented in nbdkit today, but coming soon), pwrite, zero
> and trim can be extended with a FUA (force unit access) flag, which
> would mean that the range should be persisted to disk before
> returning. It can be emulated by calling flush after the operation.
It wasn't clear if anything in this process flushes the content to
> disk. Is that what transfer.finalize does?
>
All PUT requests fsync() before returning. We optimize for complete image
trasfer, not for random io.
In other words, you are already implicitly behaving as if FUA is already
set on every single request. It might be less efficient than what you
could otherwise achieve, but it's fine if consistency is more important
than speed.
>> I would like to support only aligned offset and size - do you
think it
>> should work
>> for qemu-img?
>
> It depends a bit on what you mean by "aligned" and what the alignment
> is. We'd probably have to work around it in the plugin so that it can
> round in the request, issues a zero operation for the aligned part,
> and writes zeroes at each end. There's no guarantee that qemu-img
> will be well-behaved in the future even if it is now.
qemu-img in general tries to send sector-aligned data by default (it's
unusual that qemu tries to access less than that at once). In 2.11,
qemu-io can be made to send byte-aligned requests across any NBD
connection; in 2.12, it's tightened so that NBD requests are
sector-aligned unless the server advertised support for byte-aligned
requests (nbdkit does not yet advertise this). As a client, qemu-io
will then manually write zeroes to any unaligned portion (if there are
any), and use the actual zero request for the aligned middle.
>
Aligned for direct I/O (we use direct I/O for read/write). We can support
non-aligned ranges by doing similar emulation in the server, but I prefer
to do
it only if we have such requirement. If you need to do this in the client,
we
probably need to do this in the server otherwise all clients may need to
emulate this.
I think there is no reason that qemu-img will zero unaligned ranges, but
I guess Eric can have a better answer.
Yeah, for now, you are probably safe assuming that qemu-img will never
send unaligned ranges. You're also correct that not all NBD servers
support read-modify-write at unaligned boundaries, so well-behaved
clients have to implement it themselves; while conversely not all
clients are well-behaved so good NBD servers have to implement it -
which is a duplication of effort since both sides of the equation have
to worry about it when they want maximum cross-implementation
portability. But that's life.
And my pending patches for FUA support in nbdkit also add a
--filter=blocksize, which at least lets nbdkit guarantee aligned data
handed to the plugin even when the client is not well-behaved.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization:
qemu.org |
libvirt.org