On Fri, Aug 23, 2019 at 01:58:44PM -0500, Eric Blake wrote:
On 8/23/19 1:48 PM, Wouter Verhelst wrote:
> On Fri, Aug 23, 2019 at 09:34:26AM -0500, Eric Blake wrote:
>> +- bit 4, `NBD_CMD_FLAG_FAST_ZERO`; valid during
>> + `NBD_CMD_WRITE_ZEROES`. If set, but the server cannot perform the
>> + write zeroes any faster than it would for an equivalent
>> + `NBD_CMD_WRITE`,
>
> One way of fulfilling the letter of this requirement but not its spirit
> could be to have background writes; that is, the server makes a note
> that the zeroed region should contain zeroes, makes an error-free reply
> to the client, and then starts updating things in the background (with
> proper layering so that an NBD_CMD_READ would see zeroes).
For writes, this should still be viable IF the server can also cancel
that background write of zeroes in favor of a foreground request for
actual data to be written to the same offset. In other words, as long
as the behavior to the client is "as if" there is no duplicated I/O
cost, the zero appears fast, even if it kicked off a long-running async
process to actually accomplish it.
That's kind of what I was thinking of, yeah.
A background write would cause disk I/O, which *will* cause any write
that happen concurrently with it to slow down. If we need to write
several orders of magnitude of zeroes, then the "fast zero" will
actually cause the following writes to slow down, which could impact
performance.
The cancelling should indeed happen (otherwise ordering of writes will
be wrong, as per the spec), but that doesn't negate the performance
impact.
> This could negatively impact performance after that command to
the
> effect that syncing the device would be slower rather than faster, if
> not done right.
Oh. I see - for flush requests, you're worried about the cost of the
flush forcing the I/O for the background zero to complete before flush
can return.
Perhaps that merely means that a client using fast zero requests as a
means of probing whether it can do a bulk pre-zero pass even though it
will be rewriting part of that image with data later SHOULD NOT attempt
to flush the disk until all other interesting write requests are also
ready to queue. In the 'qemu-img convert' case which spawned this
discussion, that's certainly the case (qemu-img does not call flush
after the pre-zeroing, but only after all data is copied - and then it
really DOES want to wait for any remaining backgrounded zeroing to land
on the disk along with any normal writes when it does its final flush).
Not what I meant, but also a good point, thanks :)
> Do we want to keep that in consideration?
Ideas on how best to add what I mentioned above into the specification?
Perhaps clarify that the "fast zero" flag is meant to *improve*
performance, and that it therefore should either be implemented in a way
that does in fact improve performance, or not at all?
--
<Lo-lan-do> Home is where you have to wash the dishes.
-- #debian-devel, Freenode, 2004-09-22