On 7/13/2023 6:50 PM, Richard W.M. Jones wrote:
On Thu, Jul 13, 2023 at 05:46:56PM +0100, Richard W.M. Jones wrote:
> On Thu, Jul 13, 2023 at 04:18:03PM +0000, Tage Johansson wrote:
>> On 7/13/2023 5:37 PM, Richard W.M. Jones wrote:
>>> On Thu, Jul 13, 2023 at 03:05:30PM +0000, Tage Johansson wrote:
>>>> On 7/13/2023 4:36 PM, Eric Blake wrote:
>>>>> On Thu, Jul 13, 2023 at 01:37:49PM +0000, Tage Johansson wrote:
>>>>>> On 7/13/2023 3:26 PM, Richard W.M. Jones wrote:
>>>>>>> On Thu, Jul 13, 2023 at 08:01:09AM -0500, Eric Blake wrote:
>>>>>>>> On Thu, Jul 13, 2023 at 07:13:37AM -0500, Eric Blake
wrote:
>>>>>>>>>>> I have replaced a call to `nbd_opt_info()`
with a call to
>>>>>>>>>>> `nbd_aio_opt_info()` and passed in a
completion callback which just
>>>>>>>>>>> calls `exit(EXIT_FAILURE)`. So if the
completion callback is called
>>>>>>>>>>> the test should fail, which it doesn't,
at least not on my machine.
>>>>>>>>>> Isn't that OK? Only .free is required to be
called.
>>>>>>>>> For the context callback (for opt_set_meta),
.callback may be called
>>>>>>>>> zero, one, or multiple times, but .free should be
called exactly once.
>>>>>>>>> But for the completion callback, I agree that the
docs state that both
>>>>>>>>> .callback and .free should each be called exactly
once ("On the other
>>>>>>>>> hand, if a completion callback is supplied (only
possible with
>>>>>>>>> asynchronous commands), it will always be reached
exactly once, and
>>>>>>>>> the completion callback must not ignore the value
pointed to by
>>>>>>>>> C<error>."); we are indeed missing the
call to .callback. I'll work
>>>>>>>>> on a patch.
>>>>>>>> Eww, the bug is bigger than just nbd_aio_opt* not always
calling
>>>>>>>> completion.callback exactly once. With just this diff
(to be folded
>>>>>>>> into the larger patch I'm working on), I'm
getting an assertion
>>>>>>>> failure that we fail to call completion.callback for
>>>>>>>> nbd_aio_pread_structured when calling nbd_close() prior
to the command
>>>>>>>> running to completion, so I'll have to fix that too.
>>>>>>> Just to be clear about this, are we really sure the
completion
>>>>>>> callback should always be called once? I'm not clear why
that should
>>>>>>> be the case. (The .free callback however should be.)
>>>>>>>
>>>>>>> For example, if I call a function with bogus invalid
parameters so
>>>>>>> that it fails very early on (or when the handle is in an
incorrect
>>>>>>> state), should I expect the completion callback to be called?
I would
>>>>>>> expect not.
>>>>>>>
>>>>>>> Rich.
>>>>>> The user needs a way to know if an error occurred. So the
completion
>>>>>> callback must be called if the asynchronous function did not fail
(returned
>>>>>> 0). If the completion callback should be called, with the error
parameter
>>>>>> set, even when the asynchronous function immediately failed with
a non-zero
>>>>>> return value is another question. I see two possibilities: Either
the
>>>>>> completion callback should always be called. Or it should be
called iff the
>>>>>> asynchronous function returned 0 (did not fail).
>>>>> Good point.
>>>>>
>>>>> Our documentation currently states (docs/libnbd.pod) that the
>>>>> completion callback is ALWAYS called, but this is inconsistent with
>>>>> current code - you are correct that at present, the completion
>>>>> callback is NOT called if the aio command returns non-zero (easy
test:
>>>>> call nbd_aio_block_status before nbd_connect*). I'm game to
updating
>>>>> that part of the documentation to match existing practice (changing
>>>>> our guarantee to the completion callback will be called once iff the
>>>>> aio command reported success), since our testsuite wasn't
covering it,
>>>>> and it is probably an easier fix than munging the generator to call
>>>>> completion.callback even for aio failures. Meanwhile, the .free
>>>>> callback MUST be called unconditionally, and I think our testsuite
is
>>>>> already covering that.
>>>>>
>>>>> Life-cycle wise, I see the following sequence as being something we
>>>>> could usefully rely on (although we don't yet have enough
testsuite
>>>>> coverage to prove that we already have it). Note that it is not
only
>>>>> important how many times things are called, but I would like it if
we
>>>>> can guarantee the specific ordering between them (neither .free
should
>>>>> be called until all .callback opportunities are complete finished,
to
>>>>> avoid any use-after-free issues regardless of which of the two .free
>>>>> slots the programmer actually uses):
>>>>>
>>>>> - if aio command fails:
>>>>> mid-command .callback: 0 calls
>>>>> completion .callback: 0 calls
>>>>> mid-command .free: 1 call
>>>>> completion .free: 1 call
>>>>>
>>>>> - if aio command succeeds:
>>>>> mid-command .callback: 0, 1, or multiple times
>>>>> completion .callback: 1 call
>>>>> mid-command .free: 1 call
>>>>> completion .free: 1 call
>>>>>
>>>>> What I'm not sure of yet (without more code inspection) is
whether we
>>>>> can sometimes call completion.callback after mid-command.free.
>>>>>
>>>>>> By the way, if the error parameter is set in the completion
callback, how
>>>>>> can the user retrieve the error text? Is it possible to call
>>>>>> `get_nbd_error(3)` from inside the completion callback?
>>>>> You mean nbd_get_error(). We currently document that it is not safe
>>>>> to call any nbd_* from within a callback. Maybe we could relax that
>>>>> to carve out exceptions for nbd_get_error/errno as special cases,
>>>>> since they only inspect thread-local state and can be used even when
>>>>> there is no current nbd_handle. The problem is that I'm not sure
if
>>>>> there is a reliable string at that point in time: part of the reason
>>>>> the completion callback has an errnum parameter is that the point of
>>>>> reporting the completion failure may be in a different thread than
>>>>> when the failure was first detected, and we only preserved a numeric
>>>>> value in cmd->error rather than also preserving a string.
>>>>>
>>>>> Put another way, there is no way to guarantee that nbd_get_errno()
>>>>> will return the same value as the errnum parameter to the callback;
>>>>> and if that is true, then even if calling nbd_get_error()
doesn't
>>>>> crash or deadlock from within a callback, it is likewise probable
that
>>>>> any such string returned is not consistent with the errnum parameter
>>>>> passed to the callback.
>>>> So is there any safe way to get some description of the error from a
>>>> completion callback apart from a non-zero number? It isn't too
>>>> helpful to report to the user that the read operation faild with -1.
>>> As I recall, from the callback, no.
>>>
>>> The error is not lost however, it will be returned by the API call
>>> itself. eg. If you're in nbd_aio_opt_list -> callback (error) then
>>> nbd_aio_opt_list will return -1 and at that point you can use
>>> nbd_get_error to report the error.
>>
>> I don't understand. If I call `nbd_aio_opt_list()` with a completion
>> callback. `nbd_aio_opt_list()` will return immediately, maybe with a
>> successful result. But the command is not complete until
>> `nbd_aio_is_connecting()` returns `false`, so the completion
>> callback may be invoked with an error after `nbd_aio_opt_list()` has
>> returned.
> It's returned by some later call, such as nbd_aio_notify_read.
>
> Basically libnbd doesn't create threads, so the only place that
> callbacks can be called is when you call into libnbd. That function
> that you call is what will return an error.
Does this mean that the call to `aio_notify_read()` or
`aio_notify_write()` which discovered the error will also return the
error? And that the completion callback was called by that call to
`aio_notify_*`, in the same thread and before the call to `aio_notify_*`
returned?
Best regards,
Tage