On 01/24/2017 05:38 AM, Richard W.M. Jones wrote:
On Mon, Jan 23, 2017 at 09:13:23PM -0600, Eric Blake wrote:
> Begin the language binding followups to my new .zero callback, since
> Rich was indeed correct that we want them.
>
> I'm more familiar with python and perl (at least to the point that
> I was able to modify the appropriate example files and prove to
> myself that the bindings worked), so I've started with those.
> I'm less familiar with ruby and ocaml, so I've left those for
> tomorrow (it will rely heavily on copy-and-paste); it is also a
> chance to review if my decision of requiring a return value in
> order to trigger the graceful fallback is the best approach.
The other methods (eg. trim) deal with this by having a separate
method called "can_*" (eg. can_trim). Can we have a can_zero callback
instead?
I thought about that, but there's several problems:
1) the can_trim and similar functions are required in order to alter
what the server advertises to the guest during the handshake phase; at
which point the guest is out-of-spec if it tries to use the command that
the plugin did not advertise. But while those other commands have no
real impact to the amount of wire traffic, we want to always advertise
WRITE_ZEROES support to the client (because it is so much more efficient
to pass a WRITE_ZEROES command over the wire than a regular WRITE with a
payload of zeroes), even when the plugin doesn't support it. Since
having a can_zero callback does not affect what we advertise, the only
other reason to implement it would be if can affect what we ask the
plugin to do.
2) the fallocate() system call does not have any introspectible way to
tell if it will work on a given file system, other than to try it and
see if it fails with EOPNOTSUPP. On the C side, I thought it was
elegant to let the errno value be a deciding factor on whether the
attempt was fatal or whether to gracefully fall back to calling the
pwrite callback, while still centralizing the calloc() logic in the
plugin.c file rather than making every single C implementation of .zero
repeat the logic. Since it is not easy to introspect whether
fallocate() will fail on a given file, it's much harder to write a
can_zero() function than it is to just try fallocate() and have a
graceful fallback.
3) the NBD_CMD_WRITE_ZEROES command has a flag value that affects
operation. The existing pread and pwrite callbacks do NOT have a flag
argument (arguably, a hole in those specifications if we ever come up
with a reason that we need to pass flags to those commands, but not
necessarily something to worry about today). But the write zeroes
command, per spec, has a way to state whether the server may use trim
(provided that the hole reads back as zero) or must leave things
allocated, giving the client some rudimentary control over whether the
server will create a sparse file. The spec says that the server is not
required to trim (so falling back to pwrite is always appropriate), but
the way the spec is worded, using fallocate(FALLOC_FL_PUNCH_HOLE) is
only appropriate when the may_trim flag is set in the existing C
interface (which in turn corresponds to a lack of the
NBD_CMD_FLAG_NO_HOLE flag on the wire). Having a boolean can_zero()
callback is insufficient to tell whether the plugin would behave
differently for may_trim=0 vs. may_trim=1
So here's some ideas for alternative implementations in the language
bindings:
1) give the binding a way to request graceful fallback to write (what I
implemented in this version)
2) provide a completely different interface to the language bindings,
but keeping the C interface as already committed. The language binding
would only implement a single new entry point, perhaps named
trim_reads_as_zero(). On the C side of the binding, when plugin.c calls
lang_zero(, may_trim=1), C checks if trim_reads_as_zero() is defined and
returns true, and if so calls the existing trim() binding; if
trim_reads_as-zero() is not defined, or returns false, or trim() is not
defined, then C calls the existing pwrite() binding. Thus, languages do
NOT need a zero() binding. But doing this may lose opportunities for
optimizing an all-zero write even where fallocate(FALLOC_FL_PUNCH_HOLE)
is inappropriate, as newer Linux has added
fallocate(FALLOC_FL_ZERO_RANGE) [hmm, maybe I should do a followup patch
to file.c to add the use of that option, but that's a story for another
patch]
3) adjust the C interface to match the bindings interface proposed in 2,
by making trim_reads_as_zero() part of the overall callback interface.
We haven't released yet, so now's the time to tweak the interfaces, but
like option 2, it makes it harder for an implementation to take
advantage of faster ways to write large blocks of zeroes while still
avoiding holes.
4) document that the language bindings can't use automatic fallback to
write - they must implement it themselves. But unlike C having to do a
verbose reimplementation of a call to calloc() and free(), the languages
we bind to tend to have more compact representations for quickly
creating a buffer of all zeros (perl's "\0" x count, python's
bytearray(count), etc).
I'm probably leaning towards number 4, since you expressed concern about
number 1, but would love further opinions before I spend any more time
churning out patches that aren't needed.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library
http://libvirt.org