On 8/27/19 7:14 AM, Richard W.M. Jones wrote:
Is the plan to wait until NBD_CMF_FLAG_FAST_ZERO gets into the NBD
protocol doc before doing the rest? Also I would like to release both
libnbd 1.0 and nbdkit 1.14 before we introduce any large new features.
Both should be released this week, in fact maybe even today or
tomorrow.
Sure, I don't mind this being the first feature for the eventual libnbd
1.2 and nbdkit 1.16.
[...]
> First, I had to create a scenario where falling back to writes is
> noticeably slower than performing a zero operation, and where
> pre-zeroing also shows an effect. My choice: let's test 'qemu-img
> convert' on an image that is half-sparse (every other megabyte is a
> hole) to an in-memory nbd destination. Then I use a series of nbdkit
> filters to force the destination to behave in various manners:
> log logfile=>(sed ...|uniq -c) (track how many normal/fast zero
> requests the client makes)
> nozero $params (fine-tune how zero requests behave - the parameters
> zeromode and fastzeromode are the real drivers of my various tests)
> blocksize maxdata=256k (allows large zero requests, but forces large
> writes into smaller chunks, to magnify the effects of write delays and
> allow testing to provide obvious results with a smaller image)
> delay delay-write=20ms delay-zero=5ms (also to magnify the effects on a
> smaller image, with writes penalized more than zeroing)
> stats statsfile=/dev/stderr (to track overall time and a decent summary
> of how much I/O occurred).
> noextents (forces the entire image to report that it is allocated,
> which eliminates any testing variability based on whether qemu-img uses
> that to bypass a zeroing operation [1])
I can't help thinking that a sh plugin might have been simpler ...
Maybe, but the extra cost of forking per request may have also made
obvious timing comparisons harder. I'm just glad that nbdkit's
filtering system was flexible enough to do what I wanted, even if I did
have fun stringing together 6 filters :)
> I hope you enjoyed reading this far, and agree with my interpretation of
> the numbers about why this feature is useful!
Yes it seems reasonable.
The only thought I had is whether the qemu block layer does or should
combine requests in flight so that a write-zero (offset) followed by a
write-data (same offset) would erase the earlier request. In some
circumstances that might provide a performance improvement without
needing any changes to protocols.
As in, maintain a backlog of requests that are needed but have not yet
been sent over the wire because of backlog, and merge those requests (by
splitting an existing large zero request into smaller pieces) if write
requests come in that window before actually transmitting to the NBD
server? I know qemu has some write coalescing when servicing guest
behaviors; but I was testing on 'qemu-img convert' which does not depend
on guest behavior and therefore has already sent the zero request to the
NBD server before sending any data writes, so coalescing wouldn't see
anything to combine. Or are you worried about qemu as the NBD server,
performing coalescing of incoming requests from the client? But you are
right that some smarts about I/O coalescing at various points in the
data path may show some slight optimizations.
> - NBD should have a way to advertise (probably via NBD_INFO_ during
> NBD_OPT_GO) if the initial image is known to begin life with all zeroes
> (if that is the case, qemu-img can skip the extents calls and
> pre-zeroing pass altogether)
Yes, I really think we should do this one as well.
Stay tuned for my next cross-project post ;) Hopefully in the next week
or so.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization:
qemu.org |
libvirt.org