I'm thinking of adding one or more callbacks to nbdkit to let
plugins/filters enforce various block size alignments (for example, the
swab filter requires 2/4/8 alignment, or VDDK requires 512 alignment,
etc). The NBD protocol currently has NBD_INFO_BLOCK_SIZE which can be
sent in reply to NBD_OPT_GO to tell the client about sizing constraints;
qemu already implements it as both client and server, so we have some
reasonable testing setups (although libnbd will also need some additions
to make it easier to expose constraints to the user, and/or add new
convenience APIs to do blocksize-style read-modify-write at the libnbd
client side rather than needing the blocksize filter in the NBD server
side).
But NBD_INFO_BLOCK_SIZE is not the full picture: it only covers minimum
block size, preferred block size, and maximum data size; there has been
discussion on the NBD list about also advertising maximum action size
(discussion has mentioned trim and/or zero, but even cache could benefit
from larger buffer size than pread), which means we should be thinking
about supporting future protocol extensions in whatever we expose to
plugins.
So, I'm thinking something like the following:
New enum:
NBDKIT_BLOCK_SIZE_GET_MINIMUM
NBDKIT_BLOCK_SIZE_GET_PREFERRED
NBDKIT_BLOCK_SIZE_GET_MAX_DATA
NBDKIT_BLOCK_SIZE_GET_MAX_TRIM
NBDKIT_BLOCK_SIZE_GET_MAX_ZERO
NBDKIT_BLOCK_SIZE_GET_MAX_CACHE
along with a new callback for plugins:
int64_t block_size (void *handle, int which);
where 'which' is one of the enum values. A future nbdkit might request
an enum value not recognized at the time the plugin was compiled, so the
recommended behavior is that a plugin returns -1 on error, 0 to let
nbdkit pick a sane default (including when 'which' was unknown), or a
positive value for an actual limit. For now, nbdkit would advertise
MIN, PREFERRED, and MAX_DATA limits (to clients that use NBD_OPT_GO),
while the others are enforced only internally. The idea is that if the
plugin installs a limit, a client request that violates that limit will
fail with EINVAL for being unaligned, without even asking the plugin to
try the response. nbdkit calls the plugin callback once per connection
per reasonable 'which' (caching the results like it does for .get_size,
and skipping limits where .can_FOO fails). Returning int64_t allows us
to reuse this callback without change when we add v3 callbacks, although
values larger than 4G make no difference at present. I thought the use
of an enum was nicer than filling in a struct whose size might change,
or adding one callback per limit.
Filters can relax limits (such as blocksize turning a plugin MIN 512
into an advertised MIN 1, by doing read-modify-write as needed) or
tighten limits (the file plugin has MIN 1, but the swab filter imposes a
tighter MIN 2). Constraints between limits are as follows:
MIN: must be power of 2, between 1 and 64k; .get_size and .extents must
return results aligned to MIN (as any unaligned tail or extent
transition is inaccessible using only aligned operations). Defaults to 1.
PREFERRED: must be power of 2, between max(512, MIN) and 2M (this upper
limit is not specified by NBD spec, but matches qemu's implementation of
what it uses as the max qcow2 cluster size). Defaults to max(4k, MIN).
MAX_DATA: must be multiple of MIN and at least as large as PREFERRED;
values larger than 64M are okay but clamped by nbdkit's own internal
limits. Defaults to 64M.
MAX_TRIM, MAX_ZERO, MAX_CACHE: must be multiple of MIN, should be at
least as big as MAX_DATA. Values 4G and larger are clamped by nbdkit's
own internal limits. Defaults to 4G-MIN for TRIM/ZERO, and MAX_DATA for
CACHE.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization:
qemu.org |
libvirt.org