On 08/31/22 16:39, Eric Blake wrote:
 If a server replies to NBD_OPT_SET_META_CONTEXT first with
 NBD_REP_META_CONTEXT, then with NBD_REP_ERR_*, we have already
 modified h->meta_contexts, but fail to clean it up before moving on to
 OPT_GO in the OPT_META_CONTEXT.CHECK_REPLY's default: handler (this
 corner-case bug has been present since we got meta context support
 working back in commits a97e2779 and 1b560b62).  Per the NBD spec,
 because the overall SET_META_CONTEXT did not succeed, we should
 therefore not attempt NBD_CMD_BLOCK_STATUS; but because
 h->meta_contexts was left non-empty, we mistakenly report
 nbd_can_meta_context(first_string) as true, and allow
 nbd_block_status() to proceed; this forms a protocol violation not
 pre-filtered by our usual litany of nbd_set_strict_mode() checks, and
 therefore we can trigger unspecified server behavior.
 
 Rather than open-code a cleanup loop on all error paths, I instead fix
 the problem by adding a meta_valid witness that is only set to true in
 select places.  The obvious place is when handling NBD_REP_ACK; but it
 also makes sense if we didn't call NBD_OPT_SET_META_CONTEXT at all: if
 negotiating structured replies failed (including oldstyle servers), or
 if the user never called nbd_add_meta_context() and therefore doesn't
 care (blindly returning 0 to nbd_can_meta_context() in those cases is
 fine; our existing tests/oldstyle and tests/newstyle-limited cover
 that).  However, whereas we previously always returned a 0/1 answer
 for nbd_can_meta_context once we are in transmission phase, this new
 witness now means that in the corner case of explicit server failure
 to NBD_OPT_SET_META_CONTEXT, nbd_can_meta_context() returns -1 to call
 attention to the earlier server failure (although it is something that
 is unlikely enough in practice that I doubt anyone will experience it
 in the wild).
 
 The change in nbd_can_meta_context() to use h->meta_valid instead of
 h->eflags has an additional latent semantic difference not observable
 at this time (because both h->meta_valid and h->eflags are set in
 tandem by successful nbd_opt_go()/nbd_opt_info() in their current
 two-command sequence), but which matters to future patches adding new
 APIs.  On the one hand, once we add nbd_set_request_meta_context() to
 stop nbd_opt_go() from performing NBD_OPT_SET_META_CONTEXT, we want
 nbd_can_meta_context() to fail during transmission phase if the
 context was not set elsewhere during negotiation, rather than blindly
 returning 0.  On the other hand, when manually calling
 nbd_opt_set_meta_context() during negotiation, we do not want it to
 touch h->eflags, but do want nbd_can_meta_context() to start working
 right away.
 
 Note that the use of a witness flag also allows me to slightly change
 when the list of previously-negotiated contexts is freed - instead of
 doing it in nbd_internal_reset_size_and_flags(), that function now
 simply marks any existing vector as invalid; and the actual wipe is
 done when starting a new NBD_OPT_SET_META_CONTEXT or when closing
 struct nbd_handle.  There are still some oddities to be cleaned up in
 later patches by moving when the flag is marked invalid (for example,
 we really want nbd_set_export_name() to wipe both flags and meta
 contexts, but the act of NBD_OPT_GO should not wipe contexts, and the
 act of NBD_OPT_SET_META_CONTEXT should not wipe flags), but I'm
 leaving those further tweaks for later patches to make this one easier
 to review.
 
 Testing this is surprisingly tricky; compliant servers don't do this
 (hence I don't really have anything automatic to add to libnbd's
 testsuite).  However, I came up with the following temporary hack to
 nbdkit to demonstrate the problem:
 
 | diff --git i/server/protocol-handshake-newstyle.c
w/server/protocol-handshake-newstyle.c
 | index fedee48f..b3f34c98 100644
 | --- i/server/protocol-handshake-newstyle.c
 | +++ w/server/protocol-handshake-newstyle.c
 | @@ -884,7 +884,7 @@ negotiate_handshake_newstyle_options (void)
 |              opt_index += querylen;
 |              nr_queries--;
 |            }
 | -          if (send_newstyle_option_reply (option, NBD_REP_ACK) == -1)
 | +          if (send_newstyle_option_reply (option, NBD_REP_ERR_POLICY) == -1)
 |              return -1;
 |          }
 |          debug ("newstyle negotiation: %s: reply complete", optname);
 | diff --git i/server/protocol.c w/server/protocol.c
 | index 2ac77055..97235bda 100644
 | --- i/server/protocol.c
 | +++ w/server/protocol.c
 | @@ -191,7 +191,7 @@ validate_request (uint16_t cmd, uint16_t flags, uint64_t offset,
uint32_t count,
 |
 |    /* Block status allowed? */
 |    if (cmd == NBD_CMD_BLOCK_STATUS) {
 | -    if (!conn->structured_replies) {
 | +    if (!conn->structured_replies || 1) {
 |        nbdkit_error ("invalid request: "
 |                      "%s: structured replies was not negotiated",
 |                      name_of_nbd_cmd (cmd));
 
 Coupled with this command line:
 
 $ ./run /path/to/nbdkit -U- memory 1 --run 'nbdsh --base -u "$uri" -c
"
 try:
  print(h.can_meta_context(nbd.CONTEXT_BASE_ALLOCATION))
 except nbd.Error as ex:
  print(ex.string)
 def f(c, o, e, er):
  pass
 try:
  print(h.block_status(1, 0, f))
 except nbd.Error as ex:
  print(ex.string)
 "'
 
 Pre-patch, we see can_meta_context succeed, and that we let
 NBD_CMD_BLOCK_STATUS leak through causing an nbdkit error message:
 
 True
 nbdkit: memory.0: error: invalid request: NBD_CMD_BLOCK_STATUS: structured replies was
not negotiated
 nbd_block_status: block-status: command failed: Invalid argument
 
 Post-patch, can_meta_context diagnoses the problem, and block_status
 is blocked client-side with no messages needed from nbdkit:
 
 nbd_can_meta_context: need a successful server meta context request first: Invalid
argument
 nbd_block_status: did not negotiate any metadata contexts, either you did not call
nbd_add_meta_context before connecting or the server does not support it: Operation not
supported
 
 Fixes: a97e2779 ("states: Negotiate base:allocation with the server.", v0.1)
 ---
  lib/internal.h                               |  1 +
  generator/API.ml                             |  4 +++-
  generator/states-newstyle-opt-meta-context.c |  9 +++++++--
  generator/states-reply-structured.c          |  1 +
  lib/flags.c                                  | 14 ++++++++------
  lib/handle.c                                 |  5 +++++
  lib/rw.c                                     |  2 +-
  7 files changed, 26 insertions(+), 10 deletions(-)
 
 diff --git a/lib/internal.h b/lib/internal.h
 index d601d5d..8aaff15 100644
 --- a/lib/internal.h
 +++ b/lib/internal.h
 @@ -180,6 +180,7 @@ struct nbd_handle {
    bool structured_replies;      /* If we negotiated NBD_OPT_STRUCTURED_REPLY */
 
    /* Vector of negotiated metadata contexts. */
 +  bool meta_valid;
    meta_vector meta_contexts;
 
    /* The socket or a wrapper if using GnuTLS. */
 diff --git a/generator/API.ml b/generator/API.ml
 index 3e948aa..62e2d54 100644
 --- a/generator/API.ml
 +++ b/generator/API.ml
 @@ -1785,7 +1785,9 @@   "can_meta_context", {
      longdesc = "\
  Returns true if the server supports the given meta context
  (see L<nbd_add_meta_context(3)>).  Returns false if
 -the server does not.
 +the server does not.  It is possible for this command to fail if
 +meta contexts were requested but there is a missing or failed
 +attempt at NBD_OPT_SET_META_CONTEXT during option negotiation.
 
  The single parameter is the name of the metadata context,
  for example C<LIBNBD_CONTEXT_BASE_ALLOCATION>.
 diff --git a/generator/states-newstyle-opt-meta-context.c
b/generator/states-newstyle-opt-meta-context.c
 index 8f4dee2..a6a5271 100644
 --- a/generator/states-newstyle-opt-meta-context.c
 +++ b/generator/states-newstyle-opt-meta-context.c
 @@ -29,6 +29,9 @@ STATE_MACHINE {
     */
    assert (h->gflags & LIBNBD_HANDSHAKE_FLAG_FIXED_NEWSTYLE);
    nbd_internal_reset_size_and_flags (h);
 +  for (i = 0; i < h->meta_contexts.len; ++i)
 +    free (h->meta_contexts.ptr[i].name);
 +  meta_vector_reset (&h->meta_contexts); 
This is code movement out of nbd_internal_reset_size_and_flags(), OK.
    if (h->opt_current == NBD_OPT_LIST_META_CONTEXT) {
      assert (h->opt_mode);
      assert (h->structured_replies);
 @@ -44,7 +47,7 @@ STATE_MACHINE {
      }
    }
 
 -  assert (h->meta_contexts.len == 0);
 +  assert (!h->meta_valid); 
OK, this is the new predicate that nbd_internal_reset_size_and_flags()
now ensures.
 
    /* Calculate the length of the option request data. */
    len = 4 /* exportname len */ + strlen (h->export_name) + 4 /* nr queries */;
 @@ -194,8 +197,10 @@ STATE_MACHINE {
        CALL_CALLBACK (h->opt_cb.completion, &err);
        nbd_internal_free_option (h);
      }
 -    else
 +    else {
        SET_NEXT_STATE (%^OPT_GO.START);
 +      h->meta_valid = true;
 +    }
      break;
    case NBD_REP_META_CONTEXT:  /* A context. */
      if (len > maxpayload) 
IIUC, this modifies the NBD_REP_ACK handling for
NBD_OPT_SET_META_CONTEXT. IIUC we've consumed all the mappings, so we
can now set the witness to "OK". Seems OK.
 diff --git a/generator/states-reply-structured.c
b/generator/states-reply-structured.c
 index aaf75b8..f18c999 100644
 --- a/generator/states-reply-structured.c
 +++ b/generator/states-reply-structured.c
 @@ -445,6 +445,7 @@ STATE_MACHINE {
      assert (CALLBACK_IS_NOT_NULL (cmd->cb.fn.extent));
      assert (h->bs_entries);
      assert (length >= 12);
 +    assert (h->meta_valid);
 
      /* Need to byte-swap the entries returned, but apart from that we
       * don't validate them. 
This seems to express that we may only have initiated a block status
request (for which we are now processing the replies --
REPLY.STRUCTURED_REPLY.RECV_BS_ENTRIES) in case we had a valid meta
context list first. Seems plausible. (A bit lower down in the context,
not quoted, we "Look up the context ID" in the meta context vector, so
yes, very sensible assertion.)
 diff --git a/lib/flags.c b/lib/flags.c
 index 87c20ee..91efc1a 100644
 --- a/lib/flags.c
 +++ b/lib/flags.c
 @@ -37,9 +37,7 @@ nbd_internal_reset_size_and_flags (struct nbd_handle *h)
 
    h->exportsize = 0;
    h->eflags = 0;
 -  for (i = 0; i < h->meta_contexts.len; ++i)
 -    free (h->meta_contexts.ptr[i].name);
 -  meta_vector_reset (&h->meta_contexts); 
Moved to NEWSTYLE.OPT_META_CONTEXT.START; OK.
(This is the only place we reset the meta context vector, pre-patch.)
 +  h->meta_valid = false; 
Replacing the above logic, OK.
    h->block_minimum = 0;
    h->block_preferred = 0;
    h->block_maximum = 0;
 @@ -75,6 +73,11 @@ nbd_internal_set_size_and_flags (struct nbd_handle *h,
      eflags &= ~NBD_FLAG_SEND_FAST_ZERO;
    }
 
 +  if (!h->structured_replies || h->request_meta_contexts.len == 0) {
 +    assert (h->meta_contexts.len == 0);
 +    h->meta_valid = true;
 +  }
 +
    h->exportsize = exportsize;
    h->eflags = eflags;
    return 0; 
Hmmm. This gives me a run for my money...
I think the condition here describes the case when we're never going to
send an NBD_OPT_SET_META_CONTEXT -- meaning that we're never going to
reach the "meta_valid = true" assignment in the ACK handling for
NBD_OPT_SET_META_CONTEXT. Still we need to qualify our empty metacontext
vector as valid somewhere, so nbd_internal_set_size_and_flags() is being
proposed as the location for that.
The question is where nbd_internal_set_size_and_flags() is called
*from*. I find:
- NEWSTYLE.OPT_EXPORT_NAME.CHECK_REPLY
- NBD_REP_INFO | NBD_INFO_EXPORT
- OLDSTYLE.CHECK
We also have a comment on nbd_internal_set_size_and_flags() saying
/* Set the export size and eflags on the handle, validating them.
 * This is called from the state machine when either the newstyle or
 * oldstyle negotiation reaches the point that these are available.
 */
Aha! So, also in accordance with the commit message, I need to replace
"we're never going to reach the ACK for NBD_OPT_SET_META_CONTEXT" with
"we could never have reached the ACK for NBD_OPT_SET_META_CONTEXT".
I'm not sure if this is the *only place* (or the *best place*) to set
the witness [*], but it does look like a "good one".
[*] The commit message speaks about a "latent semantic difference" which
is currently masked by our managing meta_valid and eflags strictly in
tandem. Hopefully I'll understand that paragraph better when looking at
later patches in the series.
 @@ -194,9 +197,8 @@ nbd_unlocked_can_meta_context (struct nbd_handle
*h, const char *name)
  {
    size_t i;
 
 -  if (h->eflags == 0) {
 -    set_error (EINVAL, "server has not returned export flags, "
 -               "you need to connect to the server first");
 +  if (h->request_meta_contexts.len && !h->meta_valid) {
 +    set_error (EINVAL, "need a successful server meta context request
first");
      return -1;
    } 
This is not easy to understand at first, but I kind of convinced myself
with the following argument: the condition for successfully querying the
context list is
  h->request_meta_contexts.len == 0 || h->meta_valid
i.e., we never needed to ask the server, or the server responded (and we
parsed the response successfully). The *result* list may or may not be
empty at this point (it could be empty for two reasons: we never asked
for any contexts, or we did but the server supports none); either way,
the result list is valid for searching.
OK.
 
 diff --git a/lib/handle.c b/lib/handle.c
 index 8713e18..03f45a4 100644
 --- a/lib/handle.c
 +++ b/lib/handle.c
 @@ -115,6 +115,8 @@ nbd_create (void)
  void
  nbd_close (struct nbd_handle *h)
  {
 +  size_t i;
 +
    nbd_internal_set_error_context ("nbd_close");
 
    if (h == NULL)
 @@ -127,6 +129,9 @@ nbd_close (struct nbd_handle *h)
 
    free (h->bs_entries);
    nbd_internal_reset_size_and_flags (h);
 +  for (i = 0; i < h->meta_contexts.len; ++i)
 +    free (h->meta_contexts.ptr[i].name);
 +  meta_vector_reset (&h->meta_contexts);
    nbd_internal_free_option (h);
    free_cmd_list (h->cmds_to_issue);
    free_cmd_list (h->cmds_in_flight); 
OK.
 diff --git a/lib/rw.c b/lib/rw.c
 index 81f8f35..2ab85f3 100644
 --- a/lib/rw.c
 +++ b/lib/rw.c
 @@ -482,7 +482,7 @@ nbd_unlocked_aio_block_status (struct nbd_handle *h,
        return -1;
      }
 
 -    if (h->meta_contexts.len == 0) {
 +    if (!h->meta_valid || h->meta_contexts.len == 0) {
        set_error (ENOTSUP, "did not negotiate any metadata contexts, "
                   "either you did not call nbd_add_meta_context before "
                   "connecting or the server does not support it");
  
Conversely, here the condition for passing is
  h->meta_valid && h->meta_contexts.len > 0
meaning we asked the server, and it supports at least one of the
requested meta contexts.
I think the patch is correct, but whether it is *complete* as well is
something I can't easily evaluate. With that caveat:
Reviewed-by: Laszlo Ersek <lersek(a)redhat.com>