On 08/31/22 16:39, Eric Blake wrote:
If a server replies to NBD_OPT_SET_META_CONTEXT first with
NBD_REP_META_CONTEXT, then with NBD_REP_ERR_*, we have already
modified h->meta_contexts, but fail to clean it up before moving on to
OPT_GO in the OPT_META_CONTEXT.CHECK_REPLY's default: handler (this
corner-case bug has been present since we got meta context support
working back in commits a97e2779 and 1b560b62). Per the NBD spec,
because the overall SET_META_CONTEXT did not succeed, we should
therefore not attempt NBD_CMD_BLOCK_STATUS; but because
h->meta_contexts was left non-empty, we mistakenly report
nbd_can_meta_context(first_string) as true, and allow
nbd_block_status() to proceed; this forms a protocol violation not
pre-filtered by our usual litany of nbd_set_strict_mode() checks, and
therefore we can trigger unspecified server behavior.
Rather than open-code a cleanup loop on all error paths, I instead fix
the problem by adding a meta_valid witness that is only set to true in
select places. The obvious place is when handling NBD_REP_ACK; but it
also makes sense if we didn't call NBD_OPT_SET_META_CONTEXT at all: if
negotiating structured replies failed (including oldstyle servers), or
if the user never called nbd_add_meta_context() and therefore doesn't
care (blindly returning 0 to nbd_can_meta_context() in those cases is
fine; our existing tests/oldstyle and tests/newstyle-limited cover
that). However, whereas we previously always returned a 0/1 answer
for nbd_can_meta_context once we are in transmission phase, this new
witness now means that in the corner case of explicit server failure
to NBD_OPT_SET_META_CONTEXT, nbd_can_meta_context() returns -1 to call
attention to the earlier server failure (although it is something that
is unlikely enough in practice that I doubt anyone will experience it
in the wild).
The change in nbd_can_meta_context() to use h->meta_valid instead of
h->eflags has an additional latent semantic difference not observable
at this time (because both h->meta_valid and h->eflags are set in
tandem by successful nbd_opt_go()/nbd_opt_info() in their current
two-command sequence), but which matters to future patches adding new
APIs. On the one hand, once we add nbd_set_request_meta_context() to
stop nbd_opt_go() from performing NBD_OPT_SET_META_CONTEXT, we want
nbd_can_meta_context() to fail during transmission phase if the
context was not set elsewhere during negotiation, rather than blindly
returning 0. On the other hand, when manually calling
nbd_opt_set_meta_context() during negotiation, we do not want it to
touch h->eflags, but do want nbd_can_meta_context() to start working
right away.
Note that the use of a witness flag also allows me to slightly change
when the list of previously-negotiated contexts is freed - instead of
doing it in nbd_internal_reset_size_and_flags(), that function now
simply marks any existing vector as invalid; and the actual wipe is
done when starting a new NBD_OPT_SET_META_CONTEXT or when closing
struct nbd_handle. There are still some oddities to be cleaned up in
later patches by moving when the flag is marked invalid (for example,
we really want nbd_set_export_name() to wipe both flags and meta
contexts, but the act of NBD_OPT_GO should not wipe contexts, and the
act of NBD_OPT_SET_META_CONTEXT should not wipe flags), but I'm
leaving those further tweaks for later patches to make this one easier
to review.
Testing this is surprisingly tricky; compliant servers don't do this
(hence I don't really have anything automatic to add to libnbd's
testsuite). However, I came up with the following temporary hack to
nbdkit to demonstrate the problem:
| diff --git i/server/protocol-handshake-newstyle.c
w/server/protocol-handshake-newstyle.c
| index fedee48f..b3f34c98 100644
| --- i/server/protocol-handshake-newstyle.c
| +++ w/server/protocol-handshake-newstyle.c
| @@ -884,7 +884,7 @@ negotiate_handshake_newstyle_options (void)
| opt_index += querylen;
| nr_queries--;
| }
| - if (send_newstyle_option_reply (option, NBD_REP_ACK) == -1)
| + if (send_newstyle_option_reply (option, NBD_REP_ERR_POLICY) == -1)
| return -1;
| }
| debug ("newstyle negotiation: %s: reply complete", optname);
| diff --git i/server/protocol.c w/server/protocol.c
| index 2ac77055..97235bda 100644
| --- i/server/protocol.c
| +++ w/server/protocol.c
| @@ -191,7 +191,7 @@ validate_request (uint16_t cmd, uint16_t flags, uint64_t offset,
uint32_t count,
|
| /* Block status allowed? */
| if (cmd == NBD_CMD_BLOCK_STATUS) {
| - if (!conn->structured_replies) {
| + if (!conn->structured_replies || 1) {
| nbdkit_error ("invalid request: "
| "%s: structured replies was not negotiated",
| name_of_nbd_cmd (cmd));
Coupled with this command line:
$ ./run /path/to/nbdkit -U- memory 1 --run 'nbdsh --base -u "$uri" -c
"
try:
print(h.can_meta_context(nbd.CONTEXT_BASE_ALLOCATION))
except nbd.Error as ex:
print(ex.string)
def f(c, o, e, er):
pass
try:
print(h.block_status(1, 0, f))
except nbd.Error as ex:
print(ex.string)
"'
Pre-patch, we see can_meta_context succeed, and that we let
NBD_CMD_BLOCK_STATUS leak through causing an nbdkit error message:
True
nbdkit: memory.0: error: invalid request: NBD_CMD_BLOCK_STATUS: structured replies was
not negotiated
nbd_block_status: block-status: command failed: Invalid argument
Post-patch, can_meta_context diagnoses the problem, and block_status
is blocked client-side with no messages needed from nbdkit:
nbd_can_meta_context: need a successful server meta context request first: Invalid
argument
nbd_block_status: did not negotiate any metadata contexts, either you did not call
nbd_add_meta_context before connecting or the server does not support it: Operation not
supported
Fixes: a97e2779 ("states: Negotiate base:allocation with the server.", v0.1)
---
lib/internal.h | 1 +
generator/API.ml | 4 +++-
generator/states-newstyle-opt-meta-context.c | 9 +++++++--
generator/states-reply-structured.c | 1 +
lib/flags.c | 14 ++++++++------
lib/handle.c | 5 +++++
lib/rw.c | 2 +-
7 files changed, 26 insertions(+), 10 deletions(-)
diff --git a/lib/internal.h b/lib/internal.h
index d601d5d..8aaff15 100644
--- a/lib/internal.h
+++ b/lib/internal.h
@@ -180,6 +180,7 @@ struct nbd_handle {
bool structured_replies; /* If we negotiated NBD_OPT_STRUCTURED_REPLY */
/* Vector of negotiated metadata contexts. */
+ bool meta_valid;
meta_vector meta_contexts;
/* The socket or a wrapper if using GnuTLS. */
diff --git a/generator/API.ml b/generator/API.ml
index 3e948aa..62e2d54 100644
--- a/generator/API.ml
+++ b/generator/API.ml
@@ -1785,7 +1785,9 @@ "can_meta_context", {
longdesc = "\
Returns true if the server supports the given meta context
(see L<nbd_add_meta_context(3)>). Returns false if
-the server does not.
+the server does not. It is possible for this command to fail if
+meta contexts were requested but there is a missing or failed
+attempt at NBD_OPT_SET_META_CONTEXT during option negotiation.
The single parameter is the name of the metadata context,
for example C<LIBNBD_CONTEXT_BASE_ALLOCATION>.
diff --git a/generator/states-newstyle-opt-meta-context.c
b/generator/states-newstyle-opt-meta-context.c
index 8f4dee2..a6a5271 100644
--- a/generator/states-newstyle-opt-meta-context.c
+++ b/generator/states-newstyle-opt-meta-context.c
@@ -29,6 +29,9 @@ STATE_MACHINE {
*/
assert (h->gflags & LIBNBD_HANDSHAKE_FLAG_FIXED_NEWSTYLE);
nbd_internal_reset_size_and_flags (h);
+ for (i = 0; i < h->meta_contexts.len; ++i)
+ free (h->meta_contexts.ptr[i].name);
+ meta_vector_reset (&h->meta_contexts);
This is code movement out of nbd_internal_reset_size_and_flags(), OK.
if (h->opt_current == NBD_OPT_LIST_META_CONTEXT) {
assert (h->opt_mode);
assert (h->structured_replies);
@@ -44,7 +47,7 @@ STATE_MACHINE {
}
}
- assert (h->meta_contexts.len == 0);
+ assert (!h->meta_valid);
OK, this is the new predicate that nbd_internal_reset_size_and_flags()
now ensures.
/* Calculate the length of the option request data. */
len = 4 /* exportname len */ + strlen (h->export_name) + 4 /* nr queries */;
@@ -194,8 +197,10 @@ STATE_MACHINE {
CALL_CALLBACK (h->opt_cb.completion, &err);
nbd_internal_free_option (h);
}
- else
+ else {
SET_NEXT_STATE (%^OPT_GO.START);
+ h->meta_valid = true;
+ }
break;
case NBD_REP_META_CONTEXT: /* A context. */
if (len > maxpayload)
IIUC, this modifies the NBD_REP_ACK handling for
NBD_OPT_SET_META_CONTEXT. IIUC we've consumed all the mappings, so we
can now set the witness to "OK". Seems OK.
diff --git a/generator/states-reply-structured.c
b/generator/states-reply-structured.c
index aaf75b8..f18c999 100644
--- a/generator/states-reply-structured.c
+++ b/generator/states-reply-structured.c
@@ -445,6 +445,7 @@ STATE_MACHINE {
assert (CALLBACK_IS_NOT_NULL (cmd->cb.fn.extent));
assert (h->bs_entries);
assert (length >= 12);
+ assert (h->meta_valid);
/* Need to byte-swap the entries returned, but apart from that we
* don't validate them.
This seems to express that we may only have initiated a block status
request (for which we are now processing the replies --
REPLY.STRUCTURED_REPLY.RECV_BS_ENTRIES) in case we had a valid meta
context list first. Seems plausible. (A bit lower down in the context,
not quoted, we "Look up the context ID" in the meta context vector, so
yes, very sensible assertion.)
diff --git a/lib/flags.c b/lib/flags.c
index 87c20ee..91efc1a 100644
--- a/lib/flags.c
+++ b/lib/flags.c
@@ -37,9 +37,7 @@ nbd_internal_reset_size_and_flags (struct nbd_handle *h)
h->exportsize = 0;
h->eflags = 0;
- for (i = 0; i < h->meta_contexts.len; ++i)
- free (h->meta_contexts.ptr[i].name);
- meta_vector_reset (&h->meta_contexts);
Moved to NEWSTYLE.OPT_META_CONTEXT.START; OK.
(This is the only place we reset the meta context vector, pre-patch.)
+ h->meta_valid = false;
Replacing the above logic, OK.
h->block_minimum = 0;
h->block_preferred = 0;
h->block_maximum = 0;
@@ -75,6 +73,11 @@ nbd_internal_set_size_and_flags (struct nbd_handle *h,
eflags &= ~NBD_FLAG_SEND_FAST_ZERO;
}
+ if (!h->structured_replies || h->request_meta_contexts.len == 0) {
+ assert (h->meta_contexts.len == 0);
+ h->meta_valid = true;
+ }
+
h->exportsize = exportsize;
h->eflags = eflags;
return 0;
Hmmm. This gives me a run for my money...
I think the condition here describes the case when we're never going to
send an NBD_OPT_SET_META_CONTEXT -- meaning that we're never going to
reach the "meta_valid = true" assignment in the ACK handling for
NBD_OPT_SET_META_CONTEXT. Still we need to qualify our empty metacontext
vector as valid somewhere, so nbd_internal_set_size_and_flags() is being
proposed as the location for that.
The question is where nbd_internal_set_size_and_flags() is called
*from*. I find:
- NEWSTYLE.OPT_EXPORT_NAME.CHECK_REPLY
- NBD_REP_INFO | NBD_INFO_EXPORT
- OLDSTYLE.CHECK
We also have a comment on nbd_internal_set_size_and_flags() saying
/* Set the export size and eflags on the handle, validating them.
* This is called from the state machine when either the newstyle or
* oldstyle negotiation reaches the point that these are available.
*/
Aha! So, also in accordance with the commit message, I need to replace
"we're never going to reach the ACK for NBD_OPT_SET_META_CONTEXT" with
"we could never have reached the ACK for NBD_OPT_SET_META_CONTEXT".
I'm not sure if this is the *only place* (or the *best place*) to set
the witness [*], but it does look like a "good one".
[*] The commit message speaks about a "latent semantic difference" which
is currently masked by our managing meta_valid and eflags strictly in
tandem. Hopefully I'll understand that paragraph better when looking at
later patches in the series.
@@ -194,9 +197,8 @@ nbd_unlocked_can_meta_context (struct nbd_handle
*h, const char *name)
{
size_t i;
- if (h->eflags == 0) {
- set_error (EINVAL, "server has not returned export flags, "
- "you need to connect to the server first");
+ if (h->request_meta_contexts.len && !h->meta_valid) {
+ set_error (EINVAL, "need a successful server meta context request
first");
return -1;
}
This is not easy to understand at first, but I kind of convinced myself
with the following argument: the condition for successfully querying the
context list is
h->request_meta_contexts.len == 0 || h->meta_valid
i.e., we never needed to ask the server, or the server responded (and we
parsed the response successfully). The *result* list may or may not be
empty at this point (it could be empty for two reasons: we never asked
for any contexts, or we did but the server supports none); either way,
the result list is valid for searching.
OK.
diff --git a/lib/handle.c b/lib/handle.c
index 8713e18..03f45a4 100644
--- a/lib/handle.c
+++ b/lib/handle.c
@@ -115,6 +115,8 @@ nbd_create (void)
void
nbd_close (struct nbd_handle *h)
{
+ size_t i;
+
nbd_internal_set_error_context ("nbd_close");
if (h == NULL)
@@ -127,6 +129,9 @@ nbd_close (struct nbd_handle *h)
free (h->bs_entries);
nbd_internal_reset_size_and_flags (h);
+ for (i = 0; i < h->meta_contexts.len; ++i)
+ free (h->meta_contexts.ptr[i].name);
+ meta_vector_reset (&h->meta_contexts);
nbd_internal_free_option (h);
free_cmd_list (h->cmds_to_issue);
free_cmd_list (h->cmds_in_flight);
OK.
diff --git a/lib/rw.c b/lib/rw.c
index 81f8f35..2ab85f3 100644
--- a/lib/rw.c
+++ b/lib/rw.c
@@ -482,7 +482,7 @@ nbd_unlocked_aio_block_status (struct nbd_handle *h,
return -1;
}
- if (h->meta_contexts.len == 0) {
+ if (!h->meta_valid || h->meta_contexts.len == 0) {
set_error (ENOTSUP, "did not negotiate any metadata contexts, "
"either you did not call nbd_add_meta_context before "
"connecting or the server does not support it");
Conversely, here the condition for passing is
h->meta_valid && h->meta_contexts.len > 0
meaning we asked the server, and it supports at least one of the
requested meta contexts.
I think the patch is correct, but whether it is *complete* as well is
something I can't easily evaluate. With that caveat:
Reviewed-by: Laszlo Ersek <lersek(a)redhat.com>