Re: [Libguestfs] [libnbd PATCH 1/2] api: Tighten rules on completion.callback

Friday, 14 July 2023

On 7/14/23 16:13, Eric Blake wrote:
...
 On Fri, Jul 14, 2023 at 09:13:42AM +0200, Laszlo Ersek wrote:
> On 7/13/23 21:29, Eric Blake wrote:
>> The documentation has claimed since commit 6f4dcdab that any
>> completion callback will be called exactly once; but this is not
>> consistent with the code: if nbd_aio_* itself returns an error, then
>> nothing is queued and the user does not need to wait for a completion
>> callback to know how the command failed.  We could tweak the generator
>> to call completion.callback no matter what, but since the
>> completion.free callback already serves that role, it's easier to fix
>> the documentation to match reality.  After all, one only needs
>> completion status if an aio command returned success (if it returned
>> failure, we know that there is nothing that is going to complete
>> later).
>>
>> However, there was one place where we indeed fail to call
>> completion.callback, even though the corresponding aio call returned
>> success, which can strand a user that was depending on the callback to
>> know that the pending aio command failed after all.  That's when a
>> call to nbd_close() interrupts a connection while commands are in
>> flight.  This problem appears to have been around even before commit
>> 52b9b492 (v0.9.8) when we finally settled on having .free callbacks in
>> the first place.
>>
>> Beef up the closure-lifetimes unit test to more robustly check the
>> various conditions guaranteed by the updated documentation, and to
>> expose the previous skip of a completion callback during nbd_close.
>>
>> In summary, the behavior we want (where sequence is important) is:
>>
>> - aio command fails:
>> mid-command .callback: 0 calls
>> completion .callback: 0 calls
>> mid-command .free: 1 call
>> completion .free: 1 call
>>
>> - aio command succeeds:
>> mid-command .callback: 0, 1, or multiple calls
>> completion .callback: 1 call
>> mid-command .free: 1 call
>> completion .free: 1 call
>>
>> Reported-by: Tage Johansson <tage.j.lists(a)posteo.net&gt;
>> Fixes: 6f4dcdab ("docs: Clarify how callbacks should handle errors",
v1.11.8)
>> Signed-off-by: Eric Blake <eblake(a)redhat.com&gt;
>> ---
>>  docs/libnbd.pod           | 26 ++++++++++++++++----------
>>  lib/handle.c              |  3 +++
>>  tests/closure-lifetimes.c | 14 ++++++++++++++
>>  3 files changed, 33 insertions(+), 10 deletions(-)
>>
>> diff --git a/docs/libnbd.pod b/docs/libnbd.pod
>> index 72f74053..433479e6 100644
>> --- a/docs/libnbd.pod
>> +++ b/docs/libnbd.pod
>> @@ -904,9 +904,12 @@ same nbd object, as it would cause deadlock.
>>  =head2 Completion callbacks
>>
>>  All of the asychronous commands have an optional completion callback
>> -function that is used right before the command is marked complete,
>> -after any mid-command callbacks have finished, and before any free
>> -functions.
>> +function that is used if the asynchronous command succeeded, right
>> +before the command is marked complete, after any mid-command callbacks
>> +have finished, and before any free functions.  The completion callback
>> +is not reached if the asynchronous command itself fails, while free
>> +functions are reached regardless of the initial result of the
>> +asynchronous command.
>>
>>  When the completion callback returns C<1>, the command is
>>  automatically retired (there is no need to call
>
> I agree with this approach (i.e., with adapting the documentation to
> reality), but I find the language somewhat confusing. We have three terms:
>
> - "asynchronous command succeeds"
> - "command is marked complete"
> - "command is retired"
>
> The last two are mostly interchangeable in my view, and are also *not*
> confusing in the documentation. But the first term is confusing, IMO; it
> can easily be mistaken for meanings #2/#3. What we mean by #1 instead is
> the successful queueing (or submission) of the command. The text below
> does say "queued", but the text above doesn't. So I'd suggest
replacing
> term#1 above with "call to asynchronous API succeeds" or
"asynchronous
> command is successfully submitted" or something like that.
>
> (Side comment: I'd kinda prefer an all-or-nothing approach for async
> APIs. If the API fails at once, it should not take ownership of
> anything; i.e., it shouldn't call either completion or free callbacks.
> And if it succeeds, then it should take complete ownership. I'm not
> suggesting to rework callbacks whole-sale for this, though.)
>
> With the language clarified a bit:
>
> Acked-by: Laszlo Ersek <lersek(a)redhat.com&gt;
>
> Thanks
> Laszlo
>

 Here's what I squashed in, before pushing the series as
 dfa9473c..28134d9e

 diff --git i/docs/libnbd.pod w/docs/libnbd.pod
 index 433479e6..d8b22107 100644
 --- i/docs/libnbd.pod
 +++ w/docs/libnbd.pod
 @@ -171,8 +171,12 @@ There are several things to note here:

  =item *

 -This only starts the command.  The command is still in flight when the
 -call returns.
 +This only starts the command.  The command is (usually) still in
 +flight when the call returns success, where you must rely on
 +subsequent API calls for learning the final command outcome and
 +trigger any remaining callbacks.  However, you must also be able to
 +handle the case where system load allows the state machine to advance
 +far enough to invoke callbacks before the asynchronous API returns.

  =item *

 @@ -194,7 +198,10 @@ calls.  The cookie is unique (per libnbd handle) and E<ge> 1.

  You may register a function which is called when the command
  completes, see L</Completion callbacks> below.  In this case we have
 -specified a null completion callback.
 +specified a null completion callback.  If a completion callback is
 +specified, it will only be called if the asynchronous command was
 +sucessfully submitted (if the asynchronous API itself returns an
 +error, there is nothing further to be completed).

  =back

 @@ -897,19 +904,25 @@ asynchronous commands are retired.

  =head2 Callbacks and locking

 -The callbacks are invoked at a point where the libnbd lock is held; as
 -such, it is unsafe for the callback to call any C<nbd_*> APIs on the
 -same nbd object, as it would cause deadlock.
 +The callbacks are invoked at a point where the libnbd lock is held,
 +typically during a call to C<nbd_aio_notify_read>,
 +C<nbd_aio_notify_write>, C<nbd_aio_poll>, or other call that can
 +advance libnbd's state machine.  Depending on system load, it is even
 +possible for a callback to reached before completion of the
 +C<nbd_aio_*> call that specified the callback.  As such, it is unsafe
 +for the callback to call any C<nbd_*> APIs on the same nbd object, as
 +it would cause deadlock.

  =head2 Completion callbacks

  All of the asychronous commands have an optional completion callback
 -function that is used if the asynchronous command succeeded, right
 -before the command is marked complete, after any mid-command callbacks
 +function that is used if the call to the asynchronous API reports
 +success.  The completion callback is invoked when the submitted
 +command is eventually marked complete, after any mid-command callbacks
  have finished, and before any free functions.  The completion callback
 -is not reached if the asynchronous command itself fails, while free
 -functions are reached regardless of the initial result of the
 -asynchronous command.
 +is not reached if the asynchronous API itself fails, while free
 +callbacks are reached regardless of the result of the initial
 +asynchronous API.

  When the completion callback returns C<1>, the command is
  automatically retired (there is no need to call 
This is great, thanks!
Laszlo

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] [libnbd PATCH 1/2] api: Tighten rules on completion.callback