On 8/12/19 1:13 PM, Richard W.M. Jones wrote:
On Mon, Aug 12, 2019 at 11:29:10AM -0500, Eric Blake wrote:
> On 8/12/19 11:08 AM, Richard W.M. Jones wrote:
>> + typedef void (*nbd_free_callback) (void *ptr, void *user_data);
>> + int nbd_add_free_callback (struct nbd_handle *h,
>> + void *ptr,
>> + nbd_free_callback cb,
>> + void *user_data);
>
> Do we want to insist on a user_data argument? Libvirt, for example,
> states that free callbacks are passed only the pointer to be freed (as
> you can already pack whatever you need into that pointer alone, rather
> than needing yet another void *user_data); if we do not take a user_data
> here, then...
Indeed, and that was my original implementation, and it would
be usable in the simple case (free).
However it doesn't work for OCaml buffers (nor for Python). In OCaml
we need to free the GC root, but that is separate from the buffer
pointer.
The generated code for an OCaml persistent buffers eventually looks
like this:
/* The function may save a reference to the Buffer, so we
* must treat it as a possible GC root.
*/
value *buf_user_data;
buf_user_data = malloc (sizeof (value));
if (buf_user_data == NULL) caml_raise_out_of_memory ();
*buf_user_data = bufv;
caml_register_generational_global_root (buf_user_data);
struct nbd_buffer *buf_buf = NBD_buffer_val (bufv);
const void *buf = buf_buf->data;
size_t count = buf_buf->len;
if (nbd_add_free_callback (h, (void *)buf,
free_root, buf_user_data) == -1)
caml_raise_out_of_memory ();
Notice that the free_root function needs the pointer to the GC root,
but you can't get to that from the buffer pointer.
But still, can't we create a single C struct that tracks both the buffer
and the GC root, and then pass the address of that C struct as the one
pointer to the free function?
In other words, I'm trying to get to the point of something like:
struct closure_wrap {
/* track anything for buffer GC root and refcounting */
value *buf;
/* track anything for the OCaml callback, if one was present */
value *callback;
};
struct closure_wrap *wrap = malloc (sizeof *wrap);
wrap->buf = malloc (sizeof (*wrap->buf));
caml_register_generational_global_root (wrap->buf);
struct nbd_buffer *buf_buf = NBD_buffer_val (bufv);
const void *buf = buf_buf->data;
size_t count = buf_buf->len;
if (in nbd_aio_pread) {
wrap->callback = NULL;
res = nbd_aio_pread_callback(h, buf, count, offset,
NULL, free_cb, wrap,
flags);
} else /* in nbd_aio_pread_callback */ {
wrap->callback = ...;
res = nbd_aio_pread_callback(h, buf, count, offset,
completion_cb, free_cb, wrap,
flags);
}
where completion_cb(wrap) calls the OCaml callback (and doesn't get used
if there is no OCaml callback), and where free_cb(wrap) gets called at
the end of the lifetime for buf, wrap->callback (if any), AND wrap
itself, so that we have:
void free_cb(void *ptr) {
struct closure_wrap *wrap = ptr;
caml_remove_generational_global_root (wrap->buf);
if (wrap->callback) {
unregister the OCaml callback...
}
free (wrap);
}
A similar problem will exist in Python, but I've not got around to
writing that code yet.
>> +C<ptr> is a pointer to the object (ie. the buffer or
>> +callback C<user_data>). C<cb (ptr)> is called when libnbd
>
> As written, your patch uses C<cb (ptr, user_data)>, so either this text
> is wrong, or you really don't need user_data.
Yes it should be cb (ptr, user_data). I'll tighten it up because
there are now two user_datas.
But again, we may be able to get by with just one pointer. (I'm still
hoping the idea of a three-tuple in C: main callback, free callback, and
user data - is sufficient to give language bindings the flexibility they
need to pack up a C struct containing everything needing cleanup at the
end of the lifetime, whether or not there is also a language callback in
play)
>> + new_callbacks = realloc (h->free_callbacks,
>> + sizeof (struct free_callback) * new_alloc);
>
> Should we start relying on reallocarray() to guarantee no multiplication
> overflow? (Not yet in POSIX, but it has been proposed:
>
http://austingroupbugs.net/view.php?id=1218)
I guess - didn't know about it. Seems like it's not available in
glibc yet?
'man realloc' on Fedora 29 lists reallocarray under _GNU_SOURCE, but
fails to say which glibc introduced it, and 'man reallocarray' is
missing (bug in the man pages).
>> +
>> + /* Need to keep the list sorted by pointer so we can use bsearch.
>> + * Insert the new entry before the i'th entry.
>> + *
>> + * Note the same pointer can be added multiple times (eg. if a
>> + * buffer is shared between commands), and that is not a bug. Which
>> + * free function is called is undefined, but they should all be
>> + * called eventually.
>
> or in other words, if the free function is actually a reference counter
> that only does something interesting when the count reaches 0, then
> things still work.
Yes this is another reason why this API is trickier to use than it
seems. But this is really necessary if you share a buffer between
several AIO calls as I discovered during implementation.
Normally, a user doesn't share the same buffer on the same handle - but
we do test that for our speed tests (where we don't care about buffer
contents so much as how many API calls can we fire off, so having
cross-contamination between API calls into the same buffer in those
tests is okay). But you are also right that it is going to be common to
read into a buffer from one handle, then write that buffer out to
another handle - so our actions on capturing the buffer during
aio_pread, and then releasing it when retiring that command, really are
centered on ref-counting whatever the language handed us, and only
free()ing the actual C struct we use to tie everything together under a
single pointer.
>> + */
>> + for (i = 0; i < h->nr_free_callbacks; ++i) {
>> + if (ptr <= h->free_callbacks[i].ptr)
>
> Comparing 2 pointers that are not allocated as part of the same array is
> undefined in C. To be safe, you're better off writing this as:
>
> if ((intptr_t)ptr <= (intptr_t)h->free_callbacks[i].ptr)
>
> or even storing intptr_t rather than void* in your list of pointers
> needing callbacks.
OK.
And if I'm right about my idea of having each Closure become a 3-tuple
in C, where we can use a common free_cb(ptr) at the time the command is
retired for both the language nbd.aio_pread and for
nbd.aio_pread_callback, then pointer comparisons is moot because we
don't need to track a list of pointers, but rather just a free_cb
alongside each user_data.
>> + break;
>> + }
>
> This is a linear search O(n) for where to insert. Why not bsearch() for
> O(log n), particularly since you document your intent to use bsearch()
> for later lookups?
Yes indeed, good idea!
>> + memmove (&h->free_callbacks[i+1], &h->free_callbacks[i],
>> + (h->nr_free_callbacks - i) * sizeof (struct free_callback));
>
> Hmm - the use of memmove() is O(n); we'd need a different data structure
> (for example red-black tree or hash table) if we wanted list
> manipulations to not be the chokepoint. So, as long as you are already
> stuck with an O(n) list manipulation, using O(n) lookup is not making
> the problem any worse.
h->nr_free_callbacks is actually reasonably small in real code
(usually <= 20). It's somewhat related to the number of commands in
flight multiplied by a small constant.
And my argument is that we are not in control of that number - a user
can write a Python script that deliberately calls nbd.aio_pread 1000
times before ever calling nbd.poll to finally advance the state machine.
>> +static int
>> +compare_free_callbacks (const void *v1, const void *v2)
>> +{
>> + const void *ptr = v1;
>> + const struct free_callback *cb = v2;
>> +
>> + if (ptr < cb->ptr) return -1;
>> + else if (ptr > cb->ptr) return 1;
>
> Again, undefined in C; comparing intptr_t is defined, but direct
> comparison of unrelated pointers could cause a compiler to mis-optimize
> (even if it happens to currently work in practice).
OK. We could store the pointers in the struct as intptr_t ?
Yes, if we have to compare them at all. But I'm hoping we don't have to.
>> + free_cb = bsearch (ptr, h->free_callbacks,
h->nr_free_callbacks,
>> + sizeof (struct free_callback),
>> + compare_free_callbacks);
>
> Here, you've got O(log n) lookup.
>
>> + if (free_cb) {
>> + assert (ptr == free_cb->ptr);
>> +
>> + free_cb->cb (ptr, free_cb->user_data);
>> +
>> + /* Remove it from the free list. */
>> + memmove (free_cb, free_cb+1,
>> + sizeof (struct free_callback) *
>> + (&h->free_callbacks[h->nr_free_callbacks] -
(free_cb+1)));
>
> but then slow it down with O(n) list manipulation.
Right, but only when we actually call the callback.
For C callers we will basically never use this. h->nr_free_callbacks
will be 0 and we will do a frequent bsearch on a zero-sized array.
It's only expected to be used from other programming languages.
Okay, so the effect to C is minimal, but the effect to other languages
may be noticeable (since there we will be registering a callback for
every aio call).
> I'm still not sold on whether this is any better than our existing
> completion callbacks coupled with VALID|FREE (where at least we didn't
> suffer from O(n) lookups); or whether we want to instead explore taking
> an explicit free function pointer as part of a C closure (that is, each
> Closure maps to a three-tuple of C arguments for main function, free
> function, and user_data). But as you said in the cover letter, having
> the whole series to review will let us choose between our options, so
> I'll see what the rest of the series is able to accomplish with this.
I think we shouldn't worry about whether the implementation is
efficient since we can always improve that based on feedback from
perf. It's really about whether this is a better API for freeing or not.
Getting rid of the VALID|FREE seems worthwhile, and _since_ that is
already an API/ABI change, we can _also_ make the API/ABI change from
Closure in C being a 2-tuple (fn, ptr) to being a 3-tuple (fn, free_fn,
ptr). The tricky part that remains is how to convince the generator
that the code for Python nbd.aio_pread AND for nbd.aio_pread_callback
both need to use the C nbd_aio_pread_callback, for the sake of the
completion's free_cb() to do the necessary refcount cleanup on the buf,
at which point we no longer need a separate registration of free callbacks.
We then have the design question of whether to make an OClosure type,
where C has two functions nbd_aio_pread and nbd_aio_pread_callback for
convenience, but where other languages have only a single nbd.aio_pread
where the callback parameter is optional (the Closure type for
pread_structured chunk and for block_status extent will still be
mandatory; it is only the completion callback that is currently causing
us twice the API because we are treating it as pseudo-optional). Or
maybe we just require C clients of nbd_aio_pread to always provide
parameters for callbacks, but document that the completion callback and
free callback pointers may be NULL.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization:
qemu.org |
libvirt.org