On Thu, Sep 01, 2022 at 11:04:09AM +0200, Laszlo Ersek wrote:
...
>
> Rather than open-code a cleanup loop on all error paths, I instead fix
> the problem by adding a meta_valid witness that is only set to true in
> select places. The obvious place is when handling NBD_REP_ACK; but it
[1]
> also makes sense if we didn't call NBD_OPT_SET_META_CONTEXT
at all: if
> negotiating structured replies failed (including oldstyle servers), or
> if the user never called nbd_add_meta_context() and therefore doesn't
> care (blindly returning 0 to nbd_can_meta_context() in those cases is
> fine; our existing tests/oldstyle and tests/newstyle-limited cover
[2]
> that). However, whereas we previously always returned a 0/1
answer
> for nbd_can_meta_context once we are in transmission phase, this new
> witness now means that in the corner case of explicit server failure
> to NBD_OPT_SET_META_CONTEXT, nbd_can_meta_context() returns -1 to call
> attention to the earlier server failure (although it is something that
> is unlikely enough in practice that I doubt anyone will experience it
> in the wild).
>
> The change in nbd_can_meta_context() to use h->meta_valid instead of
> h->eflags has an additional latent semantic difference not observable
> at this time (because both h->meta_valid and h->eflags are set in
> tandem by successful nbd_opt_go()/nbd_opt_info() in their current
> two-command sequence), but which matters to future patches adding new
[3]
> APIs. On the one hand, once we add
nbd_set_request_meta_context() to
> stop nbd_opt_go() from performing NBD_OPT_SET_META_CONTEXT, we want
> nbd_can_meta_context() to fail during transmission phase if the
> context was not set elsewhere during negotiation, rather than blindly
patch 4/12 adds a test for this
> returning 0. On the other hand, when manually calling
> nbd_opt_set_meta_context() during negotiation, we do not want it to
> touch h->eflags, but do want nbd_can_meta_context() to start working
> right away.
patch 11/12 adds a test for this
>
> Note that the use of a witness flag also allows me to slightly change
> when the list of previously-negotiated contexts is freed - instead of
> doing it in nbd_internal_reset_size_and_flags(), that function now
> simply marks any existing vector as invalid; and the actual wipe is
[4]
> done when starting a new NBD_OPT_SET_META_CONTEXT or when
closing
[5]
> struct nbd_handle. There are still some oddities to be cleaned
up in
[6]
> later patches by moving when the flag is marked invalid (for
example,
> we really want nbd_set_export_name() to wipe both flags and meta
patch 9/12
> contexts, but the act of NBD_OPT_GO should not wipe contexts,
and the
patch 4/12
> act of NBD_OPT_SET_META_CONTEXT should not wipe flags), but
I'm
patch 7/12
> leaving those further tweaks for later patches to make this one
easier
> to review.
...
> +++ b/generator/states-newstyle-opt-meta-context.c
> @@ -29,6 +29,9 @@ STATE_MACHINE {
> */
> assert (h->gflags & LIBNBD_HANDSHAKE_FLAG_FIXED_NEWSTYLE);
> nbd_internal_reset_size_and_flags (h);
> + for (i = 0; i < h->meta_contexts.len; ++i)
> + free (h->meta_contexts.ptr[i].name);
> + meta_vector_reset (&h->meta_contexts);
This is code movement out of nbd_internal_reset_size_and_flags(), OK.
yep, this is the wipe [5] moved here because it was taken out of [4]
> if (h->opt_current == NBD_OPT_LIST_META_CONTEXT) {
> assert (h->opt_mode);
> assert (h->structured_replies);
> @@ -44,7 +47,7 @@ STATE_MACHINE {
> }
> }
>
> - assert (h->meta_contexts.len == 0);
> + assert (!h->meta_valid);
OK, this is the new predicate that nbd_internal_reset_size_and_flags()
now ensures.
yep, matches up with [4]
>
> /* Calculate the length of the option request data. */
> len = 4 /* exportname len */ + strlen (h->export_name) + 4 /* nr queries */;
> @@ -194,8 +197,10 @@ STATE_MACHINE {
> CALL_CALLBACK (h->opt_cb.completion, &err);
> nbd_internal_free_option (h);
> }
> - else
> + else {
> SET_NEXT_STATE (%^OPT_GO.START);
> + h->meta_valid = true;
> + }
> break;
> case NBD_REP_META_CONTEXT: /* A context. */
> if (len > maxpayload)
IIUC, this modifies the NBD_REP_ACK handling for
NBD_OPT_SET_META_CONTEXT. IIUC we've consumed all the mappings, so we
can now set the witness to "OK". Seems OK.
yep, matches to [1]. All other error paths out of this state no
longer set meta_valid, which saves me the effort of having to
open-code cleanup into each of those points.
> diff --git a/generator/states-reply-structured.c
b/generator/states-reply-structured.c
> index aaf75b8..f18c999 100644
> --- a/generator/states-reply-structured.c
> +++ b/generator/states-reply-structured.c
> @@ -445,6 +445,7 @@ STATE_MACHINE {
> assert (CALLBACK_IS_NOT_NULL (cmd->cb.fn.extent));
> assert (h->bs_entries);
> assert (length >= 12);
> + assert (h->meta_valid);
>
> /* Need to byte-swap the entries returned, but apart from that we
> * don't validate them.
This seems to express that we may only have initiated a block status
request (for which we are now processing the replies --
REPLY.STRUCTURED_REPLY.RECV_BS_ENTRIES) in case we had a valid meta
context list first. Seems plausible. (A bit lower down in the context,
not quoted, we "Look up the context ID" in the meta context vector, so
yes, very sensible assertion.)
Yeah, the assertion is useful to prove we aren't reading a
half-populated h->meta_contexts vector left half-populated after a
failed NBD_OPT_SET_META_CONTEXT; and it is met because
nbd_unlocked_aio_block_status() checked before ever issuing
NBD_CMD_BLOCK_STATUS.
> diff --git a/lib/flags.c b/lib/flags.c
> index 87c20ee..91efc1a 100644
> --- a/lib/flags.c
> +++ b/lib/flags.c
> @@ -37,9 +37,7 @@ nbd_internal_reset_size_and_flags (struct nbd_handle *h)
>
> h->exportsize = 0;
> h->eflags = 0;
> - for (i = 0; i < h->meta_contexts.len; ++i)
> - free (h->meta_contexts.ptr[i].name);
> - meta_vector_reset (&h->meta_contexts);
Moved to NEWSTYLE.OPT_META_CONTEXT.START; OK.
This is [4]; it is actually moved to both
NEWSTYLE.OPT_META_CONTEXT.START [5], and to handle cleanup [6].
(This is the only place we reset the meta context vector, pre-patch.)
> + h->meta_valid = false;
Replacing the above logic, OK.
Yes. As of this patch, calling nbd_internal_reset_size_and_flags()
still wipes both size and contexts state, unchanged from status quo
before; but later patches in the series separate the wipe of contexts
state to other locations.
> h->block_minimum = 0;
> h->block_preferred = 0;
> h->block_maximum = 0;
> @@ -75,6 +73,11 @@ nbd_internal_set_size_and_flags (struct nbd_handle *h,
> eflags &= ~NBD_FLAG_SEND_FAST_ZERO;
> }
>
> + if (!h->structured_replies || h->request_meta_contexts.len == 0) {
> + assert (h->meta_contexts.len == 0);
> + h->meta_valid = true;
> + }
> +
> h->exportsize = exportsize;
> h->eflags = eflags;
> return 0;
Hmmm. This gives me a run for my money...
I think the condition here describes the case when we're never going to
send an NBD_OPT_SET_META_CONTEXT -- meaning that we're never going to
reach the "meta_valid = true" assignment in the ACK handling for
NBD_OPT_SET_META_CONTEXT. Still we need to qualify our empty metacontext
vector as valid somewhere, so nbd_internal_set_size_and_flags() is being
proposed as the location for that.
The question is where nbd_internal_set_size_and_flags() is called
*from*. I find:
- NEWSTYLE.OPT_EXPORT_NAME.CHECK_REPLY
- NBD_REP_INFO | NBD_INFO_EXPORT
- OLDSTYLE.CHECK
We also have a comment on nbd_internal_set_size_and_flags() saying
/* Set the export size and eflags on the handle, validating them.
* This is called from the state machine when either the newstyle or
* oldstyle negotiation reaches the point that these are available.
*/
Aha! So, also in accordance with the commit message, I need to replace
"we're never going to reach the ACK for NBD_OPT_SET_META_CONTEXT" with
"we could never have reached the ACK for NBD_OPT_SET_META_CONTEXT".
I'm not sure if this is the *only place* (or the *best place*) to set
the witness [*], but it does look like a "good one".
This is point [2] in the commit message. Yeah, my alternatives were
to either find common code that is reached by both oldstyle and
OPT_EXPORT_NAME code (the two cases where we can't even attempt
NBD_OPT_SET_META_CONTEXT) - which is this function, or to open-code
the same effect into both callers. I chose to put it once in the
common code, in part because...
[*] The commit message speaks about a "latent semantic difference" which
is currently masked by our managing meta_valid and eflags strictly in
tandem. Hopefully I'll understand that paragraph better when looking at
later patches in the series.
... later in the series (4/12), this gets tweaked again by further
gating the logic here to only set h->meta_valid=true when
h->request_meta is set, rather than having to tweak all three points
that transition into transmission phase (even though only one of those
three points could have even attempted NBD_OPT_SET_META_CONTEXT), per
point [3].
> @@ -194,9 +197,8 @@ nbd_unlocked_can_meta_context (struct nbd_handle *h, const char
*name)
> {
> size_t i;
>
> - if (h->eflags == 0) {
> - set_error (EINVAL, "server has not returned export flags, "
> - "you need to connect to the server first");
> + if (h->request_meta_contexts.len && !h->meta_valid) {
> + set_error (EINVAL, "need a successful server meta context request
first");
> return -1;
> }
This is not easy to understand at first, but I kind of convinced myself
with the following argument: the condition for successfully querying the
context list is
h->request_meta_contexts.len == 0 || h->meta_valid
i.e., we never needed to ask the server, or the server responded (and we
parsed the response successfully). The *result* list may or may not be
empty at this point (it could be empty for two reasons: we never asked
for any contexts, or we did but the server supports none); either way,
the result list is valid for searching.
OK.
Correct reading.
>
> diff --git a/lib/handle.c b/lib/handle.c
> index 8713e18..03f45a4 100644
> --- a/lib/handle.c
> +++ b/lib/handle.c
> @@ -115,6 +115,8 @@ nbd_create (void)
> void
> nbd_close (struct nbd_handle *h)
> {
> + size_t i;
> +
> nbd_internal_set_error_context ("nbd_close");
>
> if (h == NULL)
> @@ -127,6 +129,9 @@ nbd_close (struct nbd_handle *h)
>
> free (h->bs_entries);
> nbd_internal_reset_size_and_flags (h);
> + for (i = 0; i < h->meta_contexts.len; ++i)
> + free (h->meta_contexts.ptr[i].name);
> + meta_vector_reset (&h->meta_contexts);
> nbd_internal_free_option (h);
> free_cmd_list (h->cmds_to_issue);
> free_cmd_list (h->cmds_in_flight);
OK.
This is [6], needed because it was moved out of [4].
> diff --git a/lib/rw.c b/lib/rw.c
> index 81f8f35..2ab85f3 100644
> --- a/lib/rw.c
> +++ b/lib/rw.c
> @@ -482,7 +482,7 @@ nbd_unlocked_aio_block_status (struct nbd_handle *h,
> return -1;
> }
>
> - if (h->meta_contexts.len == 0) {
> + if (!h->meta_valid || h->meta_contexts.len == 0) {
> set_error (ENOTSUP, "did not negotiate any metadata contexts, "
> "either you did not call nbd_add_meta_context before "
> "connecting or the server does not support it");
>
Conversely, here the condition for passing is
h->meta_valid && h->meta_contexts.len > 0
meaning we asked the server, and it supports at least one of the
requested meta contexts.
I think the patch is correct, but whether it is *complete* as well is
something I can't easily evaluate. With that caveat:
Reviewed-by: Laszlo Ersek <lersek(a)redhat.com>
Thank you for a thorough review, and I'm glad my thorough commit
message helped.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization:
qemu.org |
libvirt.org