Re: [Libguestfs] [libnbd PATCH 1/2] api: Tighten rules on completion.callback

Sunday, 16 July 2023

On 7/14/2023 4:13 PM, Eric Blake wrote:
...
 On Fri, Jul 14, 2023 at 09:13:42AM +0200, Laszlo Ersek wrote:
> On 7/13/23 21:29, Eric Blake wrote:
>> The documentation has claimed since commit 6f4dcdab that any
>> completion callback will be called exactly once; but this is not
>> consistent with the code: if nbd_aio_* itself returns an error, then
>> nothing is queued and the user does not need to wait for a completion
>> callback to know how the command failed.  We could tweak the generator
>> to call completion.callback no matter what, but since the
>> completion.free callback already serves that role, it's easier to fix
>> the documentation to match reality.  After all, one only needs
>> completion status if an aio command returned success (if it returned
>> failure, we know that there is nothing that is going to complete
>> later).
>>
>> However, there was one place where we indeed fail to call
>> completion.callback, even though the corresponding aio call returned
>> success, which can strand a user that was depending on the callback to
>> know that the pending aio command failed after all.  That's when a
>> call to nbd_close() interrupts a connection while commands are in
>> flight.  This problem appears to have been around even before commit
>> 52b9b492 (v0.9.8) when we finally settled on having .free callbacks in
>> the first place.
>>
>> Beef up the closure-lifetimes unit test to more robustly check the
>> various conditions guaranteed by the updated documentation, and to
>> expose the previous skip of a completion callback during nbd_close.
>>
>> In summary, the behavior we want (where sequence is important) is:
>>
>> - aio command fails:
>> mid-command .callback: 0 calls
>> completion .callback: 0 calls
>> mid-command .free: 1 call
>> completion .free: 1 call
>>
>> - aio command succeeds:
>> mid-command .callback: 0, 1, or multiple calls
>> completion .callback: 1 call
>> mid-command .free: 1 call
>> completion .free: 1 call
>>
>> Reported-by: Tage Johansson <tage.j.lists(a)posteo.net&gt;
>> Fixes: 6f4dcdab ("docs: Clarify how callbacks should handle errors",
v1.11.8)
>> Signed-off-by: Eric Blake <eblake(a)redhat.com&gt;
>> ---
>>   docs/libnbd.pod           | 26 ++++++++++++++++----------
>>   lib/handle.c              |  3 +++
>>   tests/closure-lifetimes.c | 14 ++++++++++++++
>>   3 files changed, 33 insertions(+), 10 deletions(-)
>>
>> diff --git a/docs/libnbd.pod b/docs/libnbd.pod
>> index 72f74053..433479e6 100644
>> --- a/docs/libnbd.pod
>> +++ b/docs/libnbd.pod
>> @@ -904,9 +904,12 @@ same nbd object, as it would cause deadlock.
>>   =head2 Completion callbacks
>>
>>   All of the asychronous commands have an optional completion callback
>> -function that is used right before the command is marked complete,
>> -after any mid-command callbacks have finished, and before any free
>> -functions.
>> +function that is used if the asynchronous command succeeded, right
>> +before the command is marked complete, after any mid-command callbacks
>> +have finished, and before any free functions.  The completion callback
>> +is not reached if the asynchronous command itself fails, while free
>> +functions are reached regardless of the initial result of the
>> +asynchronous command.
>>
>>   When the completion callback returns C<1>, the command is
>>   automatically retired (there is no need to call
> I agree with this approach (i.e., with adapting the documentation to
> reality), but I find the language somewhat confusing. We have three terms:
>
> - "asynchronous command succeeds"
> - "command is marked complete"
> - "command is retired"
>
> The last two are mostly interchangeable in my view, and are also *not*
> confusing in the documentation. But the first term is confusing, IMO; it
> can easily be mistaken for meanings #2/#3. What we mean by #1 instead is
> the successful queueing (or submission) of the command. The text below
> does say "queued", but the text above doesn't. So I'd suggest
replacing
> term#1 above with "call to asynchronous API succeeds" or
"asynchronous
> command is successfully submitted" or something like that.
>
> (Side comment: I'd kinda prefer an all-or-nothing approach for async
> APIs. If the API fails at once, it should not take ownership of
> anything; i.e., it shouldn't call either completion or free callbacks.
> And if it succeeds, then it should take complete ownership. I'm not
> suggesting to rework callbacks whole-sale for this, though.)
>
> With the language clarified a bit:
>
> Acked-by: Laszlo Ersek <lersek(a)redhat.com&gt;
>
> Thanks
> Laszlo
>
 Here's what I squashed in, before pushing the series as
 dfa9473c..28134d9e

 diff --git i/docs/libnbd.pod w/docs/libnbd.pod
 index 433479e6..d8b22107 100644
 --- i/docs/libnbd.pod
 +++ w/docs/libnbd.pod
 @@ -171,8 +171,12 @@ There are several things to note here:

   =item *

 -This only starts the command.  The command is still in flight when the
 -call returns.
 +This only starts the command.  The command is (usually) still in
 +flight when the call returns success, where you must rely on
 +subsequent API calls for learning the final command outcome and
 +trigger any remaining callbacks.  However, you must also be able to
 +handle the case where system load allows the state machine to advance
 +far enough to invoke callbacks before the asynchronous API returns.

   =item *

 @@ -194,7 +198,10 @@ calls.  The cookie is unique (per libnbd handle) and E<ge>
1.

   You may register a function which is called when the command
   completes, see L</Completion callbacks> below.  In this case we have
 -specified a null completion callback.
 +specified a null completion callback.  If a completion callback is
 +specified, it will only be called if the asynchronous command was
 +sucessfully submitted (if the asynchronous API itself returns an

Should probably be "successfully" instead of "sucessfully".

...
 +error, there is nothing further to be completed).

   =back

 @@ -897,19 +904,25 @@ asynchronous commands are retired.

   =head2 Callbacks and locking

 -The callbacks are invoked at a point where the libnbd lock is held; as
 -such, it is unsafe for the callback to call any C<nbd_*> APIs on the
 -same nbd object, as it would cause deadlock.
 +The callbacks are invoked at a point where the libnbd lock is held,
 +typically during a call to C<nbd_aio_notify_read>,
 +C<nbd_aio_notify_write>, C<nbd_aio_poll>, or other call that can
 +advance libnbd's state machine.  Depending on system load, it is even
 +possible for a callback to reached before completion of the

Shouldn't it be "to be reached" instead of "to reached"?

Best regards,

Tage

...
 +C<nbd_aio_*> call that specified the callback.  As such, it is
unsafe
 +for the callback to call any C<nbd_*> APIs on the same nbd object, as
 +it would cause deadlock.

   =head2 Completion callbacks

   All of the asychronous commands have an optional completion callback
 -function that is used if the asynchronous command succeeded, right
 -before the command is marked complete, after any mid-command callbacks
 +function that is used if the call to the asynchronous API reports
 +success.  The completion callback is invoked when the submitted
 +command is eventually marked complete, after any mid-command callbacks
   have finished, and before any free functions.  The completion callback
 -is not reached if the asynchronous command itself fails, while free
 -functions are reached regardless of the initial result of the
 -asynchronous command.
 +is not reached if the asynchronous API itself fails, while free
 +callbacks are reached regardless of the result of the initial
 +asynchronous API.

   When the completion callback returns C<1>, the command is
   automatically retired (there is no need to call

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] [libnbd PATCH 1/2] api: Tighten rules on completion.callback