On 7/13/2023 5:37 PM, Richard W.M. Jones wrote:
On Thu, Jul 13, 2023 at 03:05:30PM +0000, Tage Johansson wrote:
> On 7/13/2023 4:36 PM, Eric Blake wrote:
>> On Thu, Jul 13, 2023 at 01:37:49PM +0000, Tage Johansson wrote:
>>> On 7/13/2023 3:26 PM, Richard W.M. Jones wrote:
>>>> On Thu, Jul 13, 2023 at 08:01:09AM -0500, Eric Blake wrote:
>>>>> On Thu, Jul 13, 2023 at 07:13:37AM -0500, Eric Blake wrote:
>>>>>>>> I have replaced a call to `nbd_opt_info()` with a call
to
>>>>>>>> `nbd_aio_opt_info()` and passed in a completion callback
which just
>>>>>>>> calls `exit(EXIT_FAILURE)`. So if the completion callback
is called
>>>>>>>> the test should fail, which it doesn't, at least not
on my machine.
>>>>>>> Isn't that OK? Only .free is required to be called.
>>>>>> For the context callback (for opt_set_meta), .callback may be
called
>>>>>> zero, one, or multiple times, but .free should be called exactly
once.
>>>>>> But for the completion callback, I agree that the docs state that
both
>>>>>> .callback and .free should each be called exactly once ("On
the other
>>>>>> hand, if a completion callback is supplied (only possible with
>>>>>> asynchronous commands), it will always be reached exactly once,
and
>>>>>> the completion callback must not ignore the value pointed to by
>>>>>> C<error>."); we are indeed missing the call to
.callback. I'll work
>>>>>> on a patch.
>>>>> Eww, the bug is bigger than just nbd_aio_opt* not always calling
>>>>> completion.callback exactly once. With just this diff (to be folded
>>>>> into the larger patch I'm working on), I'm getting an
assertion
>>>>> failure that we fail to call completion.callback for
>>>>> nbd_aio_pread_structured when calling nbd_close() prior to the
command
>>>>> running to completion, so I'll have to fix that too.
>>>> Just to be clear about this, are we really sure the completion
>>>> callback should always be called once? I'm not clear why that
should
>>>> be the case. (The .free callback however should be.)
>>>>
>>>> For example, if I call a function with bogus invalid parameters so
>>>> that it fails very early on (or when the handle is in an incorrect
>>>> state), should I expect the completion callback to be called? I would
>>>> expect not.
>>>>
>>>> Rich.
>>> The user needs a way to know if an error occurred. So the completion
>>> callback must be called if the asynchronous function did not fail (returned
>>> 0). If the completion callback should be called, with the error parameter
>>> set, even when the asynchronous function immediately failed with a non-zero
>>> return value is another question. I see two possibilities: Either the
>>> completion callback should always be called. Or it should be called iff the
>>> asynchronous function returned 0 (did not fail).
>> Good point.
>>
>> Our documentation currently states (docs/libnbd.pod) that the
>> completion callback is ALWAYS called, but this is inconsistent with
>> current code - you are correct that at present, the completion
>> callback is NOT called if the aio command returns non-zero (easy test:
>> call nbd_aio_block_status before nbd_connect*). I'm game to updating
>> that part of the documentation to match existing practice (changing
>> our guarantee to the completion callback will be called once iff the
>> aio command reported success), since our testsuite wasn't covering it,
>> and it is probably an easier fix than munging the generator to call
>> completion.callback even for aio failures. Meanwhile, the .free
>> callback MUST be called unconditionally, and I think our testsuite is
>> already covering that.
>>
>> Life-cycle wise, I see the following sequence as being something we
>> could usefully rely on (although we don't yet have enough testsuite
>> coverage to prove that we already have it). Note that it is not only
>> important how many times things are called, but I would like it if we
>> can guarantee the specific ordering between them (neither .free should
>> be called until all .callback opportunities are complete finished, to
>> avoid any use-after-free issues regardless of which of the two .free
>> slots the programmer actually uses):
>>
>> - if aio command fails:
>> mid-command .callback: 0 calls
>> completion .callback: 0 calls
>> mid-command .free: 1 call
>> completion .free: 1 call
>>
>> - if aio command succeeds:
>> mid-command .callback: 0, 1, or multiple times
>> completion .callback: 1 call
>> mid-command .free: 1 call
>> completion .free: 1 call
>>
>> What I'm not sure of yet (without more code inspection) is whether we
>> can sometimes call completion.callback after mid-command.free.
>>
>>> By the way, if the error parameter is set in the completion callback, how
>>> can the user retrieve the error text? Is it possible to call
>>> `get_nbd_error(3)` from inside the completion callback?
>> You mean nbd_get_error(). We currently document that it is not safe
>> to call any nbd_* from within a callback. Maybe we could relax that
>> to carve out exceptions for nbd_get_error/errno as special cases,
>> since they only inspect thread-local state and can be used even when
>> there is no current nbd_handle. The problem is that I'm not sure if
>> there is a reliable string at that point in time: part of the reason
>> the completion callback has an errnum parameter is that the point of
>> reporting the completion failure may be in a different thread than
>> when the failure was first detected, and we only preserved a numeric
>> value in cmd->error rather than also preserving a string.
>>
>> Put another way, there is no way to guarantee that nbd_get_errno()
>> will return the same value as the errnum parameter to the callback;
>> and if that is true, then even if calling nbd_get_error() doesn't
>> crash or deadlock from within a callback, it is likewise probable that
>> any such string returned is not consistent with the errnum parameter
>> passed to the callback.
>
> So is there any safe way to get some description of the error from a
> completion callback apart from a non-zero number? It isn't too
> helpful to report to the user that the read operation faild with -1.
As I recall, from the callback, no.
The error is not lost however, it will be returned by the API call
itself. eg. If you're in nbd_aio_opt_list -> callback (error) then
nbd_aio_opt_list will return -1 and at that point you can use
nbd_get_error to report the error.
I don't understand. If I call `nbd_aio_opt_list()` with a completion
callback. `nbd_aio_opt_list()` will return immediately, maybe with a
successful result. But the command is not complete until
`nbd_aio_is_connecting()` returns `false`, so the completion callback
may be invoked with an error after `nbd_aio_opt_list()` has returned.
Also, does the value of `err` (passed into a caller) has any meaning
like a known errno value or something else that can be converted to a
description? Or is it just an arbitrary non zero integer?
Rich.