On 3/20/19 10:31 AM, Richard W.M. Jones wrote:
I think the extents map is just too complicated and is unnecessarily
so. How about instead we define the plugin interface to be:
int can_extents (void *handle); // as before
int extents (void *handle, uint32_t count, uint64_t offset, uint32_t flags,
struct nbdkit_extents_list *list);
and have the extents_list be a simple list. The first extent you add
must start at offset. And you're only allowed to append monotonically
increasing adjacent extents to it. Plugins must return at least one
extent, but are not required to return more than one extent
(regardless of flags).
Well, required to return at least one for the command to succeed (if the
return 0, then we synthesize an error, perhaps EIO, to keep the
connection to the client still alive).
I take it you'd still have to use an nbdkit_extent_add(list, count,
type), so that additions to the list are still calling malloc() from
nbdkit's perspective (as you don't want the plugin to be realloc()ing
the list)?
This would be simple enough to implement for both VDDK and file. (For
VDDK the plugin needs to synthesize the holes but that's only a slight
complication).
NBDKIT_FLAG_REQ_ONE is now simple to implement (if generating the
extents, stop after generating the first one; if it's the server
processing the extents, take the first extent in the list).
The ‘end’ that we discussed before is now implicit (it's the byte of
the last extent in the list).
So a client may give us more extents than we need, but can also easily
stop giving us extents as early as they want (and as long as they gave
us at least one, the overall call is successful). The requirement on
the plugin giving us extents in monotonically increasing order is not
too onerous; all your earlier implementations were naturally already
doing that (in other words, the full flexibility of random-access extent
probing rather than linear is over-engineered).
I like the idea.
I'm trying to figure out if filters would still need a foreach visitor.
But my initial guess is no. Consider - the earlier proposal required
the offset filter to do:
+ struct nbdkit_extents_map *map2;
+
+ map2 = nbdkit_extents_new ();
+ if (map2 == NULL)
+ return -1;
+ if (next_ops->extents (nxdata, count, offs + offset,
+ flags, map2, err) == -1) {
+ nbdkit_extents_free (map2);
+ return -1;
+ }
+
+ /* Transform offsets in map2, return result in extents_map. */
+ if (nbdkit_extents_foreach (map2, subtract_offset, extents_map,
+ NBDKIT_EXTENTS_FOREACH_FLAG_RANGE,
+ offset, range) == -1) {
+ nbdkit_extents_free (map2);
+ return -1;
+ }
+ nbdkit_extents_free (map2);
because the plugin was calling:
+ if (nbdkit_extent_add (extents_map, offset, n, type) == -1)
But if we drop offset from the interface, the offset filter simplifies to:
return next_ops->extents (nxdata, count, offs + offset, flags, list, err);
because the plugin will be calling:
if (nbdkit_extent_add(list, n type) == -1)
Also an observation: qemu's nbd client only ever issues block status
requests with the req-one flag set, so perhaps we should optimize for
that case.
I hope to get to the point where future qemu doesn't send the req-one
flag. There's several threads on the qemu list about how qemu-img is
slower than it needs to be because it is throwing away useful
information, and where it is aggravated by the kernel's abyssmal lseek()
performance on tmpfs. But until qemu learns useful caching, you're
right that most existing NBD clients that request block status do so one
extent at a time (because I don't know of any other existing NBD clients
that use BLOCK_STATUS yet).
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization:
qemu.org |
libvirt.org