This is a continuation of a discussion we were having on IRC. The
problems with IRC are it's not recorded and it's hard to have deep
technical conversations. I hope this is a decent summary.
Problem simply stated: Certain NBD servers (qemu-nbd in particular)
are able to simultaneously read and write on a socket. ie. They can
be simultaneously reading a request and writing the reply to a
previous request. However libnbd is unable to do this trick.
Although multiple requests can be in flight, libnbd is either writing
on the socket or reading from the socket, but never both.
Visualized it looks like this:
write write write
|===========|______________|=======|_______|===| --> rq to server
_____________|============|_________|=====|_____ <-- rp from server
read read
whereas an ideal libnbd which could write and read simultaneously,
coupled with a server which can do the same:
write
|===========||=======||===| --> requests to server
|============||=====||===== <-- replies from server
read
Eric already observed through testing that the ideal client can be up
to twice as fast as libnbd, as is obvious from the diagram.
The reasons why libnbd can't do this are:
Problem (a) : Only one thread of control can hold the libnbd handle
lock (h->lock), and since there can only be one thread of control
running in each handle at any time, that thread can only be reading or
writing.
Problem (b) : There is only one state machine per handle (h->state),
whereas to handle the write and read sides separately requires two
state machines. In the IRC discussion we gave these the preliminary
names h->wstate and h->rstate.
----------------------------------------------------------------------
It's worth also saying how the current API works, although we might
want to change it.
You grab the underlying file descriptor using nbd_aio_get_fd, which is
what you poll on. You also have to call nbd_aio_get_direction which
returns READ, WRITE or BOTH (== READ|WRITE). You then set up some
events mechanism (eg. poll, epoll, etc.), poll until the file
descriptor is ready, and call one of nbd_aio_notify_read or
nbd_aio_notify_write.
The direction can change any time the handle state changes, which
includes whenever you issue a command (eg. nbd_aio_pread), or whenever
you call nbd_aio_notify_*. You therefore have to call
nbd_aio_get_direction frequently.
A typical loop using poll might look like:
fd = nbd_aio_get_fd (nbd);
for (;;) {
/* <-- If you need to issue more commands, do that here. */
dir = nbd_aio_get_direction (nbd);
pollfd[0].fd = fd;
pollfd[0].events = 0;
if (dir & LIBNBD_AIO_DIRECTION_READ) pollfd[0].events |= POLLIN;
if (dir & LIBNBD_AIO_DIRECTION_WRITE) pollfd[0].events |= POLLOUT;
poll (pollfd, 1, -1);
if (pollfd[0].revents & LIBNBD_AIO_DIRECTION_READ)
nbd_aio_notify_read ();
else if (pollfd[0].revents & LIBNBD_AIO_DIRECTION_WRITE)
nbd_aio_notify_write ();
}
----------------------------------------------------------------------
The above code is of course assuming a single thread. But to
simultaneously read and write we will need at least two threads.
It's hard for me to see a nice way to evolve the API to support
multiple threads, but I guess issues include:
- If one thread is waiting in poll(2) and another thread issues a
command, how do we get the first thread to return from poll and
reevaluate direction? (Eric suggests a pipe-to-self for this)
- If two threads are waiting in poll(2) and both are notified that
the fd is ready, how do we get one of them to read and the other to
write? I think this implies that we don't have two threads doing
poll(2), but if not how do we farm out the read/write work to two
or more threads?
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/