Re: [Libguestfs] libnbd: When are callbacks freed

Wednesday, 12 July 2023

On Wed, Jul 12, 2023 at 03:33:28PM -0500, Eric Blake wrote:
...
 > > /* Issue NBD_OPT_LIST and wait for the reply. */
 > > int
 > > nbd_unlocked_opt_list (struct nbd_handle *h, nbd_list_callback *list)
 > > {
 > >   struct list_helper s = { .list = *list };
 > >   nbd_list_callback l = { .callback = list_visitor, .user_data = &s };
 > >   nbd_completion_callback c = { .callback = list_complete, .user_data =
 > > &s };
 > > 
 > >   if (nbd_unlocked_aio_opt_list (h, &l, &c) == -1) 
We are passing a stack-local variable that points to the user's
callback...[1]

...
 > Here the list callback has not been freed.
 > >     return -1;

 True at this point, but it also has not been NULL'd.  But
 nbd_unlocked_aio_opt_list() can ONLY return -1 if fails prior to the
 SET_CALLBACK_TO_NULL lines, so we also know *list still contains the
 free callback, and the wrapper nbd_opt_list() in lib/api.c WILL call
 the callback one function up in the call stack, so there is no leak
 here.

 > > 
 > >   SET_CALLBACK_TO_NULL (*list);
 > >   if (wait_for_option (h) == -1)
 > >     return -1;

 But here, we've explicitly taken ownership of *list (that is,
 successful nbd_unlocked_aio_opt_list() transferred the callback over
 to h->opt_cb.fn.list), so if wait_for_option() does not free the
 callback, then we must do so or we indeed have a bug.  But to actually
 figure that out, I have to inspect the state machine, because
 wait_for_option() itself does not currently do anything with the
 callbacks, on success or on failure.

 /me goes and digs further...

 I _do_ see that nbd_internal_free_option() calls the appropriate
 callbacks.  So the question is if we can guarantee that h->opt_cb was
 set, do we always reach nbd_internal_free_option() prior to
 nbd_unlocked_poll() progressing the state machine to the point that we
 either succeeded or transitioned out of state_connecting.

I meant to add that a grep for nbd_intenral_free_option() shows that
all of the generator/states-newstyle-opt-*.c do indeed call this as
part of their state machine.  So once h->opt_cb is set (all of our
synchronous nbd_opt calls do so), the state machine guarantees that we
will then reach go_complete(), list_complete(), or context_complete()
(whichever function we registered in our stack-local completion
callback), which in turn calls the user's free callback before
returning on the synchronous functions.

...

 When I look over the older lib/rw.c, I remember spending quite a bit
 of time patching it to get the handling correct.  Using
 nbd_unlocked_pread_structured as the comparison: if
 nbd_unlocked_aio_pread_structured() returns -1, then either *chunk has
 not yet been set to NULL (so the api.c nbd_pread_structured will call
 the free callback), or the code ensured that the transfer semantics
 happened (and now the state machine will take care of calling the free
 callback at the right point in time on cmd->cb).  There's even code in
 nbd_internal_common_command() that frees the callback before returning
 -1 when the command could not be queued to the state machine.  I even
 remember how much time I spent writing tests/test-errors.c (which Rich
 then later subdivided into finer-grained tests) covering various
 aspects (failure when called in the wrong state, failure with
 parameter validation before contacting the server, failure returned by
 the server, ...) and that each of them didn't leak.

 But I would not be surprised if my more recent additions of lib/opt.c
 missed some steps (unlike commands, you don't have parallel options in
 flight, so the code is simpler - but maybe I simplified too much),
 where we could have a leak. 
With my deeper look and testsuite additions, I haven't been able to
prove a leak on either success or failure paths.  Again, if you have a
valgrind trace, please share it (the leak may be in your use of the
code as part of the Rust bindings, and not in the core libnbd code).

...

 At any rate, I could argue that in the synchronous nbd_unlocked_opt
 functions, we should probably be able to assert that the callback was
 already NULL'd by the successful nbd_unlocked_aio_opt counterpart on
 success, rather than manually trying to re-NULL it ourselves. 
[1] the aio function NULL'd our stack-local variable, not the user's
callback, but we did successfully transfer it to the state machine, so
this explicit NULL (rather than asserting that it was already nulled)
is correct after all.  Still, I'm not opposed to a patch adding
comments to lib/opt.c to make things clearer.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] libnbd: When are callbacks freed