Re: [Libguestfs] [PATCH libnbd] ublk: Add new nbdublk program

Tuesday, 30 August 2022

On Tue, Aug 30, 2022 at 10:07:40AM +0100, Richard W.M. Jones wrote:
...
 On Tue, Aug 30, 2022 at 04:30:40PM +0800, Ming Lei wrote:
 > On Tue, Aug 30, 2022 at 09:04:07AM +0100, Richard W.M. Jones wrote:
 > > On Tue, Aug 30, 2022 at 10:32:02AM +0800, Ming Lei wrote:
 > > > Hi Jones,
 > > > 
 > > > On Thu, Aug 25, 2022 at 01:10:55PM +0100, Richard W.M. Jones wrote:
 > > > > This patch adds simple support for a ublk-based NBD client.
 > > > > It is also available here:
 > > > > https://gitlab.com/rwmjones/libnbd/-/tree/nbdublk/ublk
 > > > > 
 > > > > ublk is a way to write Linux block device drivers in userspace:
 > > > 
 > > > Just looked at your nbdublk implementation a bit, basically it is good,
 > > > and one really nice work.
 > > > 
 > > > Also follows two suggestions:
 > > > 
 > > > 1) the io_uring context is multilexed with ublk io command handling, so
 > > > we should avoid to block in both ->handle_io_async() and
 > > > ->handle_event(), otherwise performance may be bad
 > > 
 > > The nbd_aio_* calls don't block.
 > > 
 > > However I noticed that I made a mistake with the trim and zero paths
 > > because I am using synchronous (blocking) nbd_flush / nbd_trim /
 > > nbd_zero instead of nbd_aio_flush / nbd_aio_trim / nbd_aio_zero.  I
 > > will fix this soon.
 > > 
 > > Nothing in handle_event should block except for the call to
 > > pthread_mutex_lock.  This lock is necessary because new commands can
 > > be retired on the nbd_work_thread while handle_event is being called
 > > from the io_uring thread.
 > 
 > Yes, I meant the mutex.
 > 
 > Also given nbd_work_thread() is required, it is reasonable to move
 > nbd_aio_* into nbd_work_thread(), so avoid to wait on the mutex in
 > io_uring context, then io command may not be forwarded to userspace
 > in time.

 nbd_aio_* calls shouldn't block.  But how can we move those calls to
 nbd_work_thread()?  If we queue up commands in handle_io_async we 
Please see aio_submitter() in my patch, the call has been moved to nbd
work thread already.

...
 would still have to use a lock on that queue.  I don't see how we
can
 avoid some kind of locking in handle_io_async. 
pthread_spin_lock isn't supposed to sleep, for protecting the list.

Completely lockless is possible too by using per-queue bitmap to mark which
requests is submitted & completed, one per-queue array to store the
result.

But per my test, nbd perf is still a bit low, so not sure this kind of
optimization makes difference.

...

 > BTW what is the context for running callback of nbd_aio_*()?

 It depends on whether nbd_aio_* can complete the command entirely
 without blocking.

 The most common case is that nbd_aio_* calls a network function
 (eg. recv(2)) which returns EWOULDBLOCK.  Later, on nbd_work_thread,
 recv(2) is completed, the command is processed, and
 command_completed() is called.

 It's unlikely, but possible, that the command_completed() function
 could be called directly from handle_io_async -> nbd_aio_* ->
 command_completed, eg. if recv(2) never blocks for some reason like
 it's talking over a Unix domain socket to a server which is very quick
 to reply. 
OK, got it. I just tried to understand the whole flow.

...

 > > > 2) in the implementation of nbd worker thread, there are two sleep
 > > > points(wait for incoming io command, and network FD), I'd suggest to
use
 > > > poll to wait on any of them
 > > >
 > > > Recently I are working to add ublksrv io offloading or aio
 > > > interfaces on this sort of case in which io_uring can't be used,
 > > > which may simplified this area, please see the attached patch which
 > > > applies the above two points against your patch. And obvious
 > > > improvement can be observed on my simple fio test( randread, io, 4k
 > > > bs, libaio) against backend of 'nbdkit file'.
 > > >
 > > > But these interfaces aren't merged to ublksrv github tree yet, you can
find
 > > > them in the aio branch, and demo_event.c is one example wrt. how to use
 > > > them:
 > > > 
 > > >   https://github.com/ming1/ubdsrv/tree/aio
 > > >
 > > > Actually this interface can be improved further for nbdublk case,
 > > > and the request allocation isn't needed actually for this direct
 > > > offloading. But they are added for covering some IOs not from ublk
 > > > driver, such as meta data, so 'struct ublksrv_aio' is allocated.
 > > > I will try best to finalize them and merge to master branch.
 > > 
 > > I didn't really understand what these patches to ubdsrv do when I
 > > looked at them before.  Maybe add some diagrams?
 > 
 > The patches are added recently. It is just for simplifying to offload
 > IO handling on another worker context, such as nbd_work_thread.
 > 
 > It uses one eventfd to notify the target worker thread when new requests
 > are added to a list. Once the worker thread is wakeup, it fetches
 > requests from the list, after handle these requests, the worker thread
 > waits on both the eventfd and target IO(network FD) via poll().

 In the patch, it seems there is no place which creates a pthread for
 nbd_work_thread.  Also ubdsrv aio branch does not call pthread_create.
 So I don't understand in what context nbd_work_thread is called. 
nbd work thread is created by nbd target code just like before, but the thread is
changed to the following way, basically bound with one aio_ctx:

  while (!ublksrv_aio_ctx_dead(aio_ctx)) {
      struct aio_list compl;

      aio_list_init(&compl);

	  //retrieve requests from submit list, and submit each one via
	  //aio_submitter(), if anyone is done, add it to &compl.
      ublksrv_aio_submit_worker(aio_ctx, aio_submitter, &compl);

	  //add requests completed from command_completed() to &compl
      pthread_spin_lock(&c->lock);
      aio_list_splice(&c->list, &compl);
      pthread_spin_unlock(&c->lock);

      //notify io_uring thread for the completed requests in &compl,
	  //then batching complete & re-issue can be done in io_uring
	  //context
      ublksrv_aio_complete_worker(aio_ctx, &compl);

      //wait for network IO and evevfd from io_uring at the same time
	  //so if either one is ready, nbd_poll2() will return from sleep
      if (nbd_poll2 (h, aio_ctx->efd, -1) == -1) {
        fprintf (stderr, "%s\n", nbd_get_error ());
        exit (EXIT_FAILURE);
      }
  }

...

 I've never used eventfd before and the man page for it is very opaque.

 > In your previous implementation, nbd work thread may wait on one pthread
 > mutex and aio_poll(), this way may not be efficient, given when waiting
 > on one event, another events can't be handled.

 I'm not sure what "event" means in this context.  Does it mean
 "NBD command"?  Or poll(2) event? 
Here event is generic, I meant: NBD IO ready(exactly what aio_poll()
waits for) or io_uring eventfd ready(one write done from nbd_handle_io_async()).

...

 There are multiple (usually 4) nbd_work threads, one for each NBD
 network connection.  Each NBD network connection can handle many
 commands in flight at once. 
OK, but aio_poll() supposes to get notified if one command is done, so
here it is just the implementation detail, but correct me if I am wrong.

Thanks, 
Ming

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] [PATCH libnbd] ublk: Add new nbdublk program