Re: [Libguestfs] [libnbd PATCH 1/2] api: Tighten rules on completion.callback

Friday, 14 July 2023

On 7/13/23 21:29, Eric Blake wrote:
...
 The documentation has claimed since commit 6f4dcdab that any
 completion callback will be called exactly once; but this is not
 consistent with the code: if nbd_aio_* itself returns an error, then
 nothing is queued and the user does not need to wait for a completion
 callback to know how the command failed.  We could tweak the generator
 to call completion.callback no matter what, but since the
 completion.free callback already serves that role, it's easier to fix
 the documentation to match reality.  After all, one only needs
 completion status if an aio command returned success (if it returned
 failure, we know that there is nothing that is going to complete
 later).

 However, there was one place where we indeed fail to call
 completion.callback, even though the corresponding aio call returned
 success, which can strand a user that was depending on the callback to
 know that the pending aio command failed after all.  That's when a
 call to nbd_close() interrupts a connection while commands are in
 flight.  This problem appears to have been around even before commit
 52b9b492 (v0.9.8) when we finally settled on having .free callbacks in
 the first place.

 Beef up the closure-lifetimes unit test to more robustly check the
 various conditions guaranteed by the updated documentation, and to
 expose the previous skip of a completion callback during nbd_close.

 In summary, the behavior we want (where sequence is important) is:

 - aio command fails:
 mid-command .callback: 0 calls
 completion .callback: 0 calls
 mid-command .free: 1 call
 completion .free: 1 call

 - aio command succeeds:
 mid-command .callback: 0, 1, or multiple calls
 completion .callback: 1 call
 mid-command .free: 1 call
 completion .free: 1 call

 Reported-by: Tage Johansson <tage.j.lists(a)posteo.net&gt;
 Fixes: 6f4dcdab ("docs: Clarify how callbacks should handle errors", v1.11.8)
 Signed-off-by: Eric Blake <eblake(a)redhat.com&gt;
 ---
  docs/libnbd.pod           | 26 ++++++++++++++++----------
  lib/handle.c              |  3 +++
  tests/closure-lifetimes.c | 14 ++++++++++++++
  3 files changed, 33 insertions(+), 10 deletions(-)

 diff --git a/docs/libnbd.pod b/docs/libnbd.pod
 index 72f74053..433479e6 100644
 --- a/docs/libnbd.pod
 +++ b/docs/libnbd.pod
 @@ -904,9 +904,12 @@ same nbd object, as it would cause deadlock.
  =head2 Completion callbacks

  All of the asychronous commands have an optional completion callback
 -function that is used right before the command is marked complete,
 -after any mid-command callbacks have finished, and before any free
 -functions.
 +function that is used if the asynchronous command succeeded, right
 +before the command is marked complete, after any mid-command callbacks
 +have finished, and before any free functions.  The completion callback
 +is not reached if the asynchronous command itself fails, while free
 +functions are reached regardless of the initial result of the
 +asynchronous command.

  When the completion callback returns C<1>, the command is
  automatically retired (there is no need to call 
I agree with this approach (i.e., with adapting the documentation to
reality), but I find the language somewhat confusing. We have three terms:

- "asynchronous command succeeds"
- "command is marked complete"
- "command is retired"

The last two are mostly interchangeable in my view, and are also *not*
confusing in the documentation. But the first term is confusing, IMO; it
can easily be mistaken for meanings #2/#3. What we mean by #1 instead is
the successful queueing (or submission) of the command. The text below
does say "queued", but the text above doesn't. So I'd suggest replacing
term#1 above with "call to asynchronous API succeeds" or "asynchronous
command is successfully submitted" or something like that.

(Side comment: I'd kinda prefer an all-or-nothing approach for async
APIs. If the API fails at once, it should not take ownership of
anything; i.e., it shouldn't call either completion or free callbacks.
And if it succeeds, then it should take complete ownership. I'm not
suggesting to rework callbacks whole-sale for this, though.)

With the language clarified a bit:

Acked-by: Laszlo Ersek <lersek(a)redhat.com&gt;

Thanks
Laszlo

...
 @@ -946,13 +949,16 @@ mid-command callback may be reached more times
than expected if the
  server is non-compliant.

  On the other hand, if a completion callback is supplied (only possible
 -with asynchronous commands), it will always be reached exactly once,
 -and the completion callback must not ignore the value pointed to by
 -C<error>.  In particular, the content of a buffer passed to
 -L<nbd_aio_pread(3)> or L<nbd_aio_pread_structured(3)> is undefined
 -if C<*error> is non-zero on entry to the completion callback.  It is
 -recommended that if you choose to use automatic command retirement
 -(where the completion callback returns C<1> to avoid needing to call
 +with asynchronous commands), it will not be called if the initial API
 +call fails (such as attempting an asynchronous command in the wrong
 +state - there is nothing to be completed since the command was not
 +queued), but will otherwise be called exactly once, and the completion
 +callback must not ignore the value pointed to by C<error>.  In
 +particular, the content of a buffer passed to L<nbd_aio_pread(3)> or
 +L<nbd_aio_pread_structured(3)> is undefined if C<*error> is non-zero
 +on entry to the completion callback.  It is recommended that if you
 +choose to use automatic command retirement (where the completion
 +callback returns C<1> to avoid needing to call
  L<nbd_aio_command_completed(3)> later), your completion function
  should return C<1> on all control paths, even when handling errors
  (note that with automatic retirement, assigning into C<error> is
 diff --git a/lib/handle.c b/lib/handle.c
 index 0f11bee5..1d66d585 100644
 --- a/lib/handle.c
 +++ b/lib/handle.c
 @@ -37,7 +37,10 @@ free_cmd_list (struct command *list)
    struct command *cmd, *cmd_next;

    for (cmd = list; cmd != NULL; cmd = cmd_next) {
 +    int error = cmd->error ? : ENOTCONN;
 +
      cmd_next = cmd->next;
 +    CALL_CALLBACK (cmd->cb.completion, &error);
      nbd_internal_retire_and_free_command (cmd);
    }
  }
 diff --git a/tests/closure-lifetimes.c b/tests/closure-lifetimes.c
 index 9bb4e120..23691631 100644
 --- a/tests/closure-lifetimes.c
 +++ b/tests/closure-lifetimes.c
 @@ -66,6 +66,7 @@ read_cb (void *opaque,
           uint64_t offset, unsigned status, int *error)
  {
    assert (!read_cb_freed);
 +  assert (!completion_cb_called);
    read_cb_called++;
    return 0;
  }
 @@ -73,6 +74,7 @@ read_cb (void *opaque,
  static void
  read_cb_free (void *opaque)
  {
 +  assert (!completion_cb_freed);
    read_cb_freed++;
  }

 @@ -81,6 +83,7 @@ block_status_cb (void *opaque, const char *meta, uint64_t offset,
                   uint32_t *entries, size_t nr_entries, int *error)
  {
    assert (!block_status_cb_freed);
 +  assert (!completion_cb_called);
    block_status_cb_called++;
    return 0;
  }
 @@ -88,6 +91,7 @@ block_status_cb (void *opaque, const char *meta, uint64_t offset,
  static void
  block_status_cb_free (void *opaque)
  {
 +  assert (!completion_cb_freed);
    block_status_cb_freed++;
  }

 @@ -150,6 +154,10 @@ main (int argc, char *argv[])
    cookie = nbd_aio_pread_structured (nbd, buf, sizeof buf, 0, chunk_callback,
                                       completion_callback, 0);
    if (cookie == -1) NBD_ERROR;
 +  /* read_cb_called is indeterminate at this point, as state machine
 +   * progress may vary based on task schduling and network speed factors.
 +   */
 +  assert (completion_cb_called == 0);
    assert (read_cb_freed == 0);
    assert (completion_cb_freed == 0);
    while (!nbd_aio_command_completed (nbd, cookie)) {
 @@ -179,6 +187,8 @@ main (int argc, char *argv[])
    nbd_kill_subprocess (nbd, 0);
    nbd_close (nbd);

 +  /* read_cb_called is indeterminate based on timing of kill. */
 +  assert (completion_cb_called == 1);
    assert (read_cb_freed == 1);
    assert (completion_cb_freed == 1);

 @@ -198,6 +208,8 @@ main (int argc, char *argv[])
      fprintf (stderr, "%s: Expecting block_status failure\n", argv[0]);
      exit (EXIT_FAILURE);
    }
 +  assert (block_status_cb_called == 0);
 +  assert (completion_cb_called == 0);
    assert (block_status_cb_freed == 1);
    assert (completion_cb_freed == 1);

 @@ -212,6 +224,8 @@ main (int argc, char *argv[])
      fprintf (stderr, "%s: Expecting block_status failure\n", argv[0]);
      exit (EXIT_FAILURE);
    }
 +  assert (block_status_cb_called == 0);
 +  assert (completion_cb_called == 0);
    assert (block_status_cb_freed == 1);
    assert (completion_cb_freed == 1);

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] [libnbd PATCH 1/2] api: Tighten rules on completion.callback