On 5/28/19 4:26 AM, Richard W.M. Jones wrote:
On Mon, May 27, 2019 at 09:01:01PM -0500, Eric Blake wrote:
> diff --git a/lib/rw.c b/lib/rw.c
> index feaf468..343c340 100644
> --- a/lib/rw.c
> +++ b/lib/rw.c
> @@ -234,11 +234,17 @@ int64_t
> nbd_unlocked_aio_pread (struct nbd_handle *h, void *buf,
> size_t count, uint64_t offset, uint32_t flags)
> {
> - if (flags != 0) {
> + if ((flags & ~LIBNBD_CMD_FLAG_DF) != 0) {
> set_error (EINVAL, "invalid flag: %" PRIu32, flags);
> return -1;
> }
>
> + if ((flags & LIBNBD_CMD_FLAG_DF) != 0 &&
> + nbd_unlocked_can_df (h) != 1) {
> + set_error (EINVAL, "server does not support the DF flag");
> + return -1;
> + }
I'm confused why you'd want to specify this flag to pread. From the
caller point of view they shouldn't care about whether the reply on
the wire is fragmented or not?
Summarizing an IRC conversation, I doubt we will ever see any serious
users request the DF flag for nbd_[aio_]pread. The only thing the flag
would ever do for such readers is turn what would otherwise be a
successful read into an EOVERFLOW failure, or to possibly make the
connection slower (because getting a single-buffer response required
more network traffic than what would otherwise be possible by sending
holes). So the consensus would be that while we might add support for
passing the flag, we don't have to go out of our way to document it for
nbd_pread (or maybe even document that it is pointless), and instead,
focus on:
(I can understand why you'd need this flag if we implemented a
separate structured_pread call.)
...what exactly we want to support here. My observation is that right
now, libnbd does NOT validate server compliance on structured read (we
don't know if a server populated the entire buffer, or whether it sent
overlapping chunks). But having some sort of callback that gets invoked
on each chunk would allow the client to add in such checking; we could
even provide a utility function in the library to serve as such a
callback if our checking is sufficient for the client's purposes.
Something like:
nbd_pread_callback(nbd, buf, count, offset, opaque, callback, flags)
where we still read off the wire into buf (to avoid ping-ponging a read
into a temporary buf that the callback then has to memcpy() into the
real buf), but call the callback as soon as each chunk is received:
callback(opaque, buf, count, offset, status)
telling the user which subset of buf was just populated, and where
status is data, hole, or error. The signature may still need tweaking.
Or we may even want to let the user register a set of callbacks, where a
different callback is invoked for different chunk types:
set = nbd_callback_set_create(nbd, NBD_CMD_READ, opaque, default_cb,
default_error_cb);
nbd_callback_set_add(nbd, set, NBD_REPLY_TYPE_OFFSET_DATA, opaque, cb);
nbd_callback_set_add(nbd, set, NBD_REPLY_TYPE_OFFSET_HOLE, opaque, cb);
nbd_pread_callback(nbd, buf, count, offset, set, flags);
The idea of registering a set of callbacks to handle a particular
integer command id may work well for other extensions as well;
particularly if we encounter a case where an app wants to use libnbd for
the groundwork (setting up a tls connection handshake) but still
implement their own non-standard NBD extensions that the server will
understand (where the libnbd state machine parses the chunks off the
wire, but the client then interprets those chunks). Which means we may
also need hooks for a client to inject other NBD_OPT sequences into the
state machine beyond what we know natively.
Various ideas still floating around, it may be a while before we
actually have code to match those ideas in order to pick what works best.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization:
qemu.org |
libvirt.org