On Tue, Aug 30, 2022 at 10:07:40AM +0100, Richard W.M. Jones wrote:
On Tue, Aug 30, 2022 at 04:30:40PM +0800, Ming Lei wrote:
> On Tue, Aug 30, 2022 at 09:04:07AM +0100, Richard W.M. Jones wrote:
> > On Tue, Aug 30, 2022 at 10:32:02AM +0800, Ming Lei wrote:
> > > Hi Jones,
> > >
> > > On Thu, Aug 25, 2022 at 01:10:55PM +0100, Richard W.M. Jones wrote:
> > > > This patch adds simple support for a ublk-based NBD client.
> > > > It is also available here:
> > > >
https://gitlab.com/rwmjones/libnbd/-/tree/nbdublk/ublk
> > > >
> > > > ublk is a way to write Linux block device drivers in userspace:
> > >
> > > Just looked at your nbdublk implementation a bit, basically it is good,
> > > and one really nice work.
> > >
> > > Also follows two suggestions:
> > >
> > > 1) the io_uring context is multilexed with ublk io command handling, so
> > > we should avoid to block in both ->handle_io_async() and
> > > ->handle_event(), otherwise performance may be bad
> >
> > The nbd_aio_* calls don't block.
> >
> > However I noticed that I made a mistake with the trim and zero paths
> > because I am using synchronous (blocking) nbd_flush / nbd_trim /
> > nbd_zero instead of nbd_aio_flush / nbd_aio_trim / nbd_aio_zero. I
> > will fix this soon.
> >
> > Nothing in handle_event should block except for the call to
> > pthread_mutex_lock. This lock is necessary because new commands can
> > be retired on the nbd_work_thread while handle_event is being called
> > from the io_uring thread.
>
> Yes, I meant the mutex.
>
> Also given nbd_work_thread() is required, it is reasonable to move
> nbd_aio_* into nbd_work_thread(), so avoid to wait on the mutex in
> io_uring context, then io command may not be forwarded to userspace
> in time.
nbd_aio_* calls shouldn't block. But how can we move those calls to
nbd_work_thread()? If we queue up commands in handle_io_async we
Please see aio_submitter() in my patch, the call has been moved to nbd
work thread already.
would still have to use a lock on that queue. I don't see how we
can
avoid some kind of locking in handle_io_async.
pthread_spin_lock isn't supposed to sleep, for protecting the list.
Completely lockless is possible too by using per-queue bitmap to mark which
requests is submitted & completed, one per-queue array to store the
result.
But per my test, nbd perf is still a bit low, so not sure this kind of
optimization makes difference.
> BTW what is the context for running callback of nbd_aio_*()?
It depends on whether nbd_aio_* can complete the command entirely
without blocking.
The most common case is that nbd_aio_* calls a network function
(eg. recv(2)) which returns EWOULDBLOCK. Later, on nbd_work_thread,
recv(2) is completed, the command is processed, and
command_completed() is called.
It's unlikely, but possible, that the command_completed() function
could be called directly from handle_io_async -> nbd_aio_* ->
command_completed, eg. if recv(2) never blocks for some reason like
it's talking over a Unix domain socket to a server which is very quick
to reply.
OK, got it. I just tried to understand the whole flow.
> > > 2) in the implementation of nbd worker thread, there are two sleep
> > > points(wait for incoming io command, and network FD), I'd suggest to
use
> > > poll to wait on any of them
> > >
> > > Recently I are working to add ublksrv io offloading or aio
> > > interfaces on this sort of case in which io_uring can't be used,
> > > which may simplified this area, please see the attached patch which
> > > applies the above two points against your patch. And obvious
> > > improvement can be observed on my simple fio test( randread, io, 4k
> > > bs, libaio) against backend of 'nbdkit file'.
> > >
> > > But these interfaces aren't merged to ublksrv github tree yet, you can
find
> > > them in the aio branch, and demo_event.c is one example wrt. how to use
> > > them:
> > >
> > >
https://github.com/ming1/ubdsrv/tree/aio
> > >
> > > Actually this interface can be improved further for nbdublk case,
> > > and the request allocation isn't needed actually for this direct
> > > offloading. But they are added for covering some IOs not from ublk
> > > driver, such as meta data, so 'struct ublksrv_aio' is allocated.
> > > I will try best to finalize them and merge to master branch.
> >
> > I didn't really understand what these patches to ubdsrv do when I
> > looked at them before. Maybe add some diagrams?
>
> The patches are added recently. It is just for simplifying to offload
> IO handling on another worker context, such as nbd_work_thread.
>
> It uses one eventfd to notify the target worker thread when new requests
> are added to a list. Once the worker thread is wakeup, it fetches
> requests from the list, after handle these requests, the worker thread
> waits on both the eventfd and target IO(network FD) via poll().
In the patch, it seems there is no place which creates a pthread for
nbd_work_thread. Also ubdsrv aio branch does not call pthread_create.
So I don't understand in what context nbd_work_thread is called.
nbd work thread is created by nbd target code just like before, but the thread is
changed to the following way, basically bound with one aio_ctx:
while (!ublksrv_aio_ctx_dead(aio_ctx)) {
struct aio_list compl;
aio_list_init(&compl);
//retrieve requests from submit list, and submit each one via
//aio_submitter(), if anyone is done, add it to &compl.
ublksrv_aio_submit_worker(aio_ctx, aio_submitter, &compl);
//add requests completed from command_completed() to &compl
pthread_spin_lock(&c->lock);
aio_list_splice(&c->list, &compl);
pthread_spin_unlock(&c->lock);
//notify io_uring thread for the completed requests in &compl,
//then batching complete & re-issue can be done in io_uring
//context
ublksrv_aio_complete_worker(aio_ctx, &compl);
//wait for network IO and evevfd from io_uring at the same time
//so if either one is ready, nbd_poll2() will return from sleep
if (nbd_poll2 (h, aio_ctx->efd, -1) == -1) {
fprintf (stderr, "%s\n", nbd_get_error ());
exit (EXIT_FAILURE);
}
}
I've never used eventfd before and the man page for it is very opaque.
> In your previous implementation, nbd work thread may wait on one pthread
> mutex and aio_poll(), this way may not be efficient, given when waiting
> on one event, another events can't be handled.
I'm not sure what "event" means in this context. Does it mean
"NBD command"? Or poll(2) event?
Here event is generic, I meant: NBD IO ready(exactly what aio_poll()
waits for) or io_uring eventfd ready(one write done from nbd_handle_io_async()).
There are multiple (usually 4) nbd_work threads, one for each NBD
network connection. Each NBD network connection can handle many
commands in flight at once.
OK, but aio_poll() supposes to get notified if one command is done, so
here it is just the implementation detail, but correct me if I am wrong.
Thanks,
Ming