On 3/19/19 11:35 AM, Richard W.M. Jones wrote:
This commit implements the core NBD protocol for the
"base:allocation"
Block Status replies.
---
server/internal.h | 7 +
server/protocol.h | 17 +-
server/protocol-handshake-newstyle.c | 79 ++++++++-
server/protocol.c | 248 +++++++++++++++++++++++++--
4 files changed, 335 insertions(+), 16 deletions(-)
+static int
+send_newstyle_option_reply_meta_context (struct connection *conn,
+ uint32_t option, uint32_t reply,
+ uint32_t context_id,
+ const char *name)
+{
+ struct fixed_new_option_reply fixed_new_option_reply;
+ struct fixed_new_option_reply_meta_context context;
+ const size_t namelen = strlen (name);
+
+ fixed_new_option_reply.magic = htobe64 (NBD_REP_MAGIC);
+ fixed_new_option_reply.option = htobe32 (option);
+ fixed_new_option_reply.reply = htobe32 (reply);
+ fixed_new_option_reply.replylen = htobe32 (sizeof context + namelen);
+ context.context_id = htobe32 (context_id);
The NBD protocol recommends that context_id for replies to
OPT_LIST_META_CONTEXT be set to 0, but requires the client to ignore the
context_id. Here, you are blindly returning a context_id of 1
regardless of _LIST or _SET.
@@ -469,6 +499,16 @@ negotiate_handshake_newstyle_options (struct
connection *conn)
continue;
}
+ /* Work out if the server supports base:allocation. */
+ r = backend->can_extents (backend, conn);
+ if (r == -1) {
+ if (send_newstyle_option_reply (conn, option, NBD_REP_ERR_INVALID)
+ == -1)
+ return -1;
+ continue;
+ }
+ can_extents = r;
Hmm. For NBD_CMD_WRITE_ZEROES, we decided that in the common case,
.can_zero would default to true for plugins even when the plugin lacks
.zero (on the grounds that always advertising it to the client allows
for less network traffic, even if nbdkit has to do the fallback) (see
commits 7ac98831, bde4fc4). We then added the nozero filter to
specifically disable the advertisement as needed.
This seems like another case where we should advertise the feature by
default (even when the plugin lacks .extents) on the grounds that a
client might benefit from knowing that an image can't report sparseness,
then provide a filter comparable to nozero that fine-tunes what we
actually advertise.
But I don't think that fine-tuning affects this patch (which looks right
to me to just query .can_extents), but rather what we default to in
patch 1 for plugins (I also argued that defaulting filters to default to
false until we've audited all filters may be wise).
+static int
+count_extents (uint64_t offset, uint64_t length, uint32_t type,
+ void *rv)
+{
+ size_t *rp = rv;
+
+ (*rp)++;
+ return 0;
+}
If you take my suggestion of an END type, I think that foreach() should
still iterate over any extent marked END, but count_extents should not
count it. Or maybe foreach gains a flag for whether the caller is
interested in visiting END extents or just stopping right away when one
is encountered (filters would be interested, while this code is probably
not).
+static int
+copy_extents (uint64_t offset, uint64_t length, uint32_t type,
+ void *dv)
+{
+ struct copy_extents_data *data = dv;
+ uint32_t type_flags;
+
+ assert (data->i < data->nr_blocks);
+
+ /* Because the original request is limited to a 32 bit count, length
+ * can never be > 32 bits in size.
+ */
+ assert (length <= UINT32_MAX);
If we pre-populate a map with data over the client's initial range and
with END beyond the range, it is conceivable that a client will slam the
end of the file to be a data extent which will then coalesce to
something larger than 32 bits. But we can also be clever and call
nbd_extents_add(offset + UINT32_MAX, END) after calling into the client
prior to calling foreach to keep this assertion true, even if we want to
support the NBD protocol's ability for the last extent (when no REQ_ONE
flag) to advertise a length longer that the client's request.
+ /* Calculate the number of blocks we will be returning. */
+ *nr_blocks = 0;
+ if (nbdkit_extents_foreach (extents_map,
+ count_extents, nr_blocks,
+ foreach_flags,
+ offset, (uint64_t) count) == -1)
+ return errno;
+ assert (!req_one || *nr_blocks == 1);
+
You probably ought to return an error if nr_blocks is 0 (possible if a
client registered an END type immediately on offset), as the NBD spec
requires a successful call to make progress.
+ /* Allocate the final array. */
+ *blocks = malloc (sizeof (struct block_descriptor) * *nr_blocks);
Hmm. We probably ought to cap the number of extents we are willing to
send to avoid an over-long reply. Since we reject NBD_CMD_READ calls
requesting more than 64M of data, we probably also ought to guarantee a
truncated reply if it would otherwise generate more than 64M of extent
information in the reply (or maybe pick a smaller limit, maybe 1M?).
Each extent block is 64 bytes, so capping at 1M blocks would be a 64M
reply; a lower cap of 64k blocks or even 1k blocks would still be
reasonable.
+static int
+send_structured_reply_block_status (struct connection *conn,
+ uint64_t handle,
+ uint16_t cmd, uint16_t flags,
+ uint32_t count, uint64_t offset,
+ const struct block_descriptor *blocks,
+ size_t nr_blocks)
+{
Looks good.
@@ -402,6 +587,9 @@ protocol_recv_request_send_reply (struct
connection *conn)
@@ -458,6 +647,13 @@ protocol_recv_request_send_reply (struct connection *conn)
}
}
+ /* Allocate the extents map for block status only. */
+ if (cmd == NBD_CMD_BLOCK_STATUS) {
+ extents_map = nbdkit_extents_new ();
May need tweaking if you like my idea of having _new() take the client's
initial bounds as parameters.
+ /* XXX There are complicated requirements for the block status
+ * reply, such as the offset, length and number of extents returned
+ * in the structured reply. To allow a simple implementation for
+ * plugins we don't make the plugins obey these requirements. This
+ * means at some point we need to filter what the plugin gives us to
+ * obey the protocol requirements. There are several places we
+ * could do that. Currently we do it here. Another possibility is
+ * to do it in server/plugins.c.
+ *
+ * Also note this only deals with base:allocation. If in future we
+ * want to describe other block status metadata then this code will
+ * require an overhaul.
Correct observations; permitting arbitrary metadata contexts will indeed
be a much more interesting addition, if we ever decide to do so.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization:
qemu.org |
libvirt.org