On Sat, Mar 23, 2019 at 11:25:06AM -0500, Eric Blake wrote:
On 3/20/19 5:11 PM, Richard W.M. Jones wrote:
> + myfilter_extents (...)
> + {
> + size_t i;
> + struct nbdkit_extents *extents2;
> + struct nbdkit_extent e;
> +
> + extents2 = nbdkit_extents_new (offset);
> + next_ops->extents (nxdata, count, offset, flags, extents2, err);
> + for (i = 0; i < nbdkit_extents_size (extents2); ++i) {
> + e = nbdkit_get_extent (extents2, i);
> + e.offset = /* transform offset */;
> + nbdkit_add_extent (extents, e.offset, e.length, e.type);
> + }
> + nbdkit_extents_free (extents2);
> + }
> +
> +More complicated transformations may require the filter to allocate a
> +new extents list using C<nbdkit_extents_new> and free the old one
> +using C<nbdkit_extents_free>.
This sentence sounds odd after the example. You'll probably want to
reword this a bit.
Yes, that sentence was left over after I changed the example, so it
now doesn't make any sense. I'll remove it, and I'm also going to fix
the example above it so it shows how to adjust offsets in the returned
extents map.
In the rest of the email you suggested a different and a bit more
complex struct nbdkit_extents implementation. I think that we should
concentrate on getting the plugin API right, and at the moment the
only API we've defined as safe for plugins is:
extern int nbdkit_add_extent (struct nbdkit_extents *,
uint64_t offset, uint64_t length,
uint32_t type);
plus the contract that offsets must be added contiguously and cover
the range from <= offset. I think we can refine the filter API later
since we are not bound by ABI considerations for filters.
Although I do wish we could constrain plugins to only using that API
somehow ...
> + struct nbdkit_extents *nbdkit_extents_new (uint64_t start);
> +
> +Allocates and returns a new, empty extents list. The C<start>
> +parameter is the start of the range described in the list. Normally
> +you would pass in C<offset> as this parameter.
Well, the offset that the plugin should start at (which might not be
your C<offset>, if your filter is adjusting the plugin's offsets)
Very true, I've adjusted the text and the example.
> +=head2 C<.can_extents>
> +
> + int can_extents (void *handle);
> +
> +This is called during the option negotiation phase to find out if the
> +plugin supports detecting allocated (non-sparse) regions of the disk
> +with the C<.extents> callback.
> +
> +If there is an error, C<.can_extents> should call C<nbdkit_error> with
> +an error message and return C<-1>.
> +
> +This callback is not required. If omitted, then we return true iff a
> +C<.extents> callback has been defined.
Should we default this to returning true always, _because_ we have the
sane fallback of treating the entire image as allocated when .extents is
absent? (That is, a plugin has to explicitly opt out of advertising
NBD_CMD_BLOCK_STATUS support, because our default works well enough even
if the plugin didn't do anything). It would match how filters can call
next_ops->zero() even for plugins that do not have .zero (because
.can_zero defaults to true).
There is definitely a case for removing can_extents completely, and
relying on the fallback. Would this affect language bindings? I
think no because in the language bindings we can also emulate the
.extents callback.
Associated with this change would be to change the server so it always
advertises and enables base:allocation (if the client requests it).
I'm trying to think if there's any reason not to make both of those
changes, and failing to think of a reason at the moment ...
> +Extents overlapping the range
C<[offset...offset+count-1]> should be
> +returned if possible. However nbdkit ignores extents E<lt> offset so
> +the plugin may, if it is easier to implement, return all extent
> +information for the whole disk. The plugin may return extents beyond
> +the end of the range. It may also return extent information for less
> +than the whole range, but it must return at least one extent.
must return at least one extent overlapping with C<[offset]>.
(if the plugin returns all extent information prior to offset, but stops
before offset is reached, that counts as not returning any extents).
Yup, will adjust this text.
> +The extents B<must> be added in ascending order, and
B<must> be
> +contiguous.
> +
> +The C<flags> parameter may contain the flag C<NBDKIT_FLAG_REQ_ONE>
> +which means that the client is only requesting information about the
> +extent overlapping C<offset>. The plugin may ignore this flag, or as
> +an optimization it may return just a single extent for C<offset>.
> +(Note that implementing this optimization correctly is difficult and
> +subtle. You must ensure that the upper limit of the single extent
> +returned is correct. If unsure then the safest option is to ignore
> +this flag.)
I'm not sure we still need this parenthetical - with your simpler
implementation, I think it is harder for plugins to get things wrong
(the NBD protocol does NOT require the upper limit to be accurate,
merely that it makes progress - although the NBD protocol recommends
that servers return maximal upper bounds, it also warns clients that
they must be prepared to coalesce short returns with identical type).
Yes, agreed, will remove.
> +++ b/include/nbdkit-filter.h
> @@ -1,5 +1,5 @@
> /* nbdkit
> - * Copyright (C) 2013-2018 Red Hat Inc.
> + * Copyright (C) 2013-2019 Red Hat Inc.
> * All rights reserved.
> *
> * Redistribution and use in source and binary forms, with or without
> @@ -44,6 +44,18 @@ extern "C" {
>
> #define NBDKIT_FILTER_API_VERSION 2
>
> +struct nbdkit_extent {
> + uint64_t offset;
> + uint64_t length;
> + uint32_t type;
> +};
> +
> +extern struct nbdkit_extents *nbdkit_extents_new (uint64_t start);
> +extern void nbdkit_extents_free (struct nbdkit_extents *);
> +extern size_t nbdkit_extents_size (const struct nbdkit_extents *);
'size' makes me wonder if this is the number of bytes covered by the
extents object, rather than the number of extent objects in the array.
Would calling it 'nbdkit_extents_count' be any easier to understand?
Yup, I'll change this.
> +++ b/server/extents.c
> @@ -0,0 +1,171 @@
> +struct nbdkit_extents {
> + struct nbdkit_extent *extents;
> + size_t nr_extents, allocated;
> + uint64_t start;
> +};
Is it also worth tracking cumulative length covered by extents?
Tracking it would give us an O(1) instead of O(n) answer to "how many
bytes did the plugin tell us about" - which MAY matter to filters that
want to append additional data about a hole after the underlying plugin
I may have misunderstood, but isn't that given by:
extents[nr_extents-1].offset + extents[nr_extents-1].length - start ?
Anyway at the moment I'm not greatly worried about the implementation
of extents or how filters are implemented, as long as we have got the
plugin API right.
Rich
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
virt-top is 'top' for virtual machines. Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top