On Thu, Jul 23, 2020 at 10:54:31AM -0500, Eric Blake wrote:
I'm thinking of adding one or more callbacks to nbdkit to let
plugins/filters enforce various block size alignments (for example,
the swab filter requires 2/4/8 alignment, or VDDK requires 512
alignment, etc). The NBD protocol currently has NBD_INFO_BLOCK_SIZE
which can be sent in reply to NBD_OPT_GO to tell the client about
sizing constraints; qemu already implements it as both client and
server, so we have some reasonable testing setups (although libnbd
will also need some additions to make it easier to expose
constraints to the user, and/or add new convenience APIs to do
blocksize-style read-modify-write at the libnbd client side rather
than needing the blocksize filter in the NBD server side).
But NBD_INFO_BLOCK_SIZE is not the full picture: it only covers
minimum block size, preferred block size, and maximum data size;
there has been discussion on the NBD list about also advertising
maximum action size (discussion has mentioned trim and/or zero, but
even cache could benefit from larger buffer size than pread), which
means we should be thinking about supporting future protocol
extensions in whatever we expose to plugins.
So, I'm thinking something like the following:
New enum:
NBDKIT_BLOCK_SIZE_GET_MINIMUM
NBDKIT_BLOCK_SIZE_GET_PREFERRED
NBDKIT_BLOCK_SIZE_GET_MAX_DATA
NBDKIT_BLOCK_SIZE_GET_MAX_TRIM
NBDKIT_BLOCK_SIZE_GET_MAX_ZERO
NBDKIT_BLOCK_SIZE_GET_MAX_CACHE
enum or int? I think there are ABI problems with enums, although
probably not unless we have more than 256 cases?
along with a new callback for plugins:
int64_t block_size (void *handle, int which);
where 'which' is one of the enum values. A future nbdkit might
request an enum value not recognized at the time the plugin was
compiled, so the recommended behavior is that a plugin returns -1 on
error, 0 to let nbdkit pick a sane default (including when 'which'
was unknown), or a positive value for an actual limit. For now,
nbdkit would advertise MIN, PREFERRED, and MAX_DATA limits (to
clients that use NBD_OPT_GO), while the others are enforced only
internally. The idea is that if the plugin installs a limit, a
client request that violates that limit will fail with EINVAL for
being unaligned, without even asking the plugin to try the response.
Isn't the plan that the server would try to resolve the problem -
eg. by making a RMW request or splitting a request? This would be
especially the case for internal requests, but I could also see it
having value for clients which either ignore the block size
information, or don't implement it correctly/completely.
nbdkit calls the plugin callback once per connection per reasonable
'which' (caching the results like it does for .get_size, and
skipping limits where .can_FOO fails). Returning int64_t allows us
to reuse this callback without change when we add v3 callbacks,
although values larger than 4G make no difference at present. I
thought the use of an enum was nicer than filling in a struct whose
size might change, or adding one callback per limit.
Yes the enum/int seems like a better idea than dealing with structs,
and is also easier to map into other programming languages. The
overhead of a few extra indirect function calls is negligible because
it's only once per connection.
The only other alternative I can see would be to put these into struct
nbdkit_plugin (as int64_t fields), but we have always regretted using
plain fields instead of functions in this struct, eg. thread_model,
config_help, etc.
Filters can relax limits (such as blocksize turning a plugin MIN 512
into an advertised MIN 1, by doing read-modify-write as needed) or
tighten limits (the file plugin has MIN 1, but the swab filter
imposes a tighter MIN 2). Constraints between limits are as
follows:
Can the server do this transparently between layers? Might be a lot
easier to implement it once.
MIN: must be power of 2, between 1 and 64k; .get_size and .extents
must return results aligned to MIN (as any unaligned tail or extent
transition is inaccessible using only aligned operations). Defaults
to 1.
PREFERRED: must be power of 2, between max(512, MIN) and 2M (this
upper limit is not specified by NBD spec, but matches qemu's
implementation of what it uses as the max qcow2 cluster size).
Defaults to max(4k, MIN).
MAX_DATA: must be multiple of MIN and at least as large as
PREFERRED; values larger than 64M are okay but clamped by nbdkit's
own internal limits. Defaults to 64M.
MAX_TRIM, MAX_ZERO, MAX_CACHE: must be multiple of MIN, should be at
least as big as MAX_DATA. Values 4G and larger are clamped by
nbdkit's own internal limits. Defaults to 4G-MIN for TRIM/ZERO, and
MAX_DATA for CACHE.
Sounds reasonable otherwise.
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/