On Fri, Jul 14, 2023 at 09:13:42AM +0200, Laszlo Ersek wrote:
 On 7/13/23 21:29, Eric Blake wrote:
 > The documentation has claimed since commit 6f4dcdab that any
 > completion callback will be called exactly once; but this is not
 > consistent with the code: if nbd_aio_* itself returns an error, then
 > nothing is queued and the user does not need to wait for a completion
 > callback to know how the command failed.  We could tweak the generator
 > to call completion.callback no matter what, but since the
 > completion.free callback already serves that role, it's easier to fix
 > the documentation to match reality.  After all, one only needs
 > completion status if an aio command returned success (if it returned
 > failure, we know that there is nothing that is going to complete
 > later).
 > 
 > However, there was one place where we indeed fail to call
 > completion.callback, even though the corresponding aio call returned
 > success, which can strand a user that was depending on the callback to
 > know that the pending aio command failed after all.  That's when a
 > call to nbd_close() interrupts a connection while commands are in
 > flight.  This problem appears to have been around even before commit
 > 52b9b492 (v0.9.8) when we finally settled on having .free callbacks in
 > the first place.
 > 
 > Beef up the closure-lifetimes unit test to more robustly check the
 > various conditions guaranteed by the updated documentation, and to
 > expose the previous skip of a completion callback during nbd_close.
 > 
 > In summary, the behavior we want (where sequence is important) is:
 > 
 > - aio command fails:
 > mid-command .callback: 0 calls
 > completion .callback: 0 calls
 > mid-command .free: 1 call
 > completion .free: 1 call
 > 
 > - aio command succeeds:
 > mid-command .callback: 0, 1, or multiple calls
 > completion .callback: 1 call
 > mid-command .free: 1 call
 > completion .free: 1 call
 > 
 > Reported-by: Tage Johansson <tage.j.lists(a)posteo.net>
 > Fixes: 6f4dcdab ("docs: Clarify how callbacks should handle errors",
v1.11.8)
 > Signed-off-by: Eric Blake <eblake(a)redhat.com>
 > ---
 >  docs/libnbd.pod           | 26 ++++++++++++++++----------
 >  lib/handle.c              |  3 +++
 >  tests/closure-lifetimes.c | 14 ++++++++++++++
 >  3 files changed, 33 insertions(+), 10 deletions(-)
 > 
 > diff --git a/docs/libnbd.pod b/docs/libnbd.pod
 > index 72f74053..433479e6 100644
 > --- a/docs/libnbd.pod
 > +++ b/docs/libnbd.pod
 > @@ -904,9 +904,12 @@ same nbd object, as it would cause deadlock.
 >  =head2 Completion callbacks
 > 
 >  All of the asychronous commands have an optional completion callback
 > -function that is used right before the command is marked complete,
 > -after any mid-command callbacks have finished, and before any free
 > -functions.
 > +function that is used if the asynchronous command succeeded, right
 > +before the command is marked complete, after any mid-command callbacks
 > +have finished, and before any free functions.  The completion callback
 > +is not reached if the asynchronous command itself fails, while free
 > +functions are reached regardless of the initial result of the
 > +asynchronous command.
 > 
 >  When the completion callback returns C<1>, the command is
 >  automatically retired (there is no need to call
 
 I agree with this approach (i.e., with adapting the documentation to
 reality), but I find the language somewhat confusing. We have three terms:
 
 - "asynchronous command succeeds"
 - "command is marked complete"
 - "command is retired"
 
 The last two are mostly interchangeable in my view, and are also *not*
 confusing in the documentation. But the first term is confusing, IMO; it
 can easily be mistaken for meanings #2/#3. What we mean by #1 instead is
 the successful queueing (or submission) of the command. The text below
 does say "queued", but the text above doesn't. So I'd suggest
replacing
 term#1 above with "call to asynchronous API succeeds" or "asynchronous
 command is successfully submitted" or something like that.
 
 (Side comment: I'd kinda prefer an all-or-nothing approach for async
 APIs. If the API fails at once, it should not take ownership of
 anything; i.e., it shouldn't call either completion or free callbacks.
 And if it succeeds, then it should take complete ownership. I'm not
 suggesting to rework callbacks whole-sale for this, though.)
 
 With the language clarified a bit:
 
 Acked-by: Laszlo Ersek <lersek(a)redhat.com>
 
 Thanks
 Laszlo
 
Here's what I squashed in, before pushing the series as
dfa9473c..28134d9e
diff --git i/docs/libnbd.pod w/docs/libnbd.pod
index 433479e6..d8b22107 100644
--- i/docs/libnbd.pod
+++ w/docs/libnbd.pod
@@ -171,8 +171,12 @@ There are several things to note here:
 =item *
-This only starts the command.  The command is still in flight when the
-call returns.
+This only starts the command.  The command is (usually) still in
+flight when the call returns success, where you must rely on
+subsequent API calls for learning the final command outcome and
+trigger any remaining callbacks.  However, you must also be able to
+handle the case where system load allows the state machine to advance
+far enough to invoke callbacks before the asynchronous API returns.
 =item *
@@ -194,7 +198,10 @@ calls.  The cookie is unique (per libnbd handle) and E<ge> 1.
 You may register a function which is called when the command
 completes, see L</Completion callbacks> below.  In this case we have
-specified a null completion callback.
+specified a null completion callback.  If a completion callback is
+specified, it will only be called if the asynchronous command was
+sucessfully submitted (if the asynchronous API itself returns an
+error, there is nothing further to be completed).
 =back
@@ -897,19 +904,25 @@ asynchronous commands are retired.
 =head2 Callbacks and locking
-The callbacks are invoked at a point where the libnbd lock is held; as
-such, it is unsafe for the callback to call any C<nbd_*> APIs on the
-same nbd object, as it would cause deadlock.
+The callbacks are invoked at a point where the libnbd lock is held,
+typically during a call to C<nbd_aio_notify_read>,
+C<nbd_aio_notify_write>, C<nbd_aio_poll>, or other call that can
+advance libnbd's state machine.  Depending on system load, it is even
+possible for a callback to reached before completion of the
+C<nbd_aio_*> call that specified the callback.  As such, it is unsafe
+for the callback to call any C<nbd_*> APIs on the same nbd object, as
+it would cause deadlock.
 =head2 Completion callbacks
 All of the asychronous commands have an optional completion callback
-function that is used if the asynchronous command succeeded, right
-before the command is marked complete, after any mid-command callbacks
+function that is used if the call to the asynchronous API reports
+success.  The completion callback is invoked when the submitted
+command is eventually marked complete, after any mid-command callbacks
 have finished, and before any free functions.  The completion callback
-is not reached if the asynchronous command itself fails, while free
-functions are reached regardless of the initial result of the
-asynchronous command.
+is not reached if the asynchronous API itself fails, while free
+callbacks are reached regardless of the result of the initial
+asynchronous API.
 When the completion callback returns C<1>, the command is
 automatically retired (there is no need to call
-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  
qemu.org | 
libvirt.org