There are several races / deadlocks which I've thought about. Let's
see if I can remember them all ...
(1) This I experienced: nbd_aio_get_fd deadlocks if there are
concurrent synchronous APIs going on. A typical case is where you set
up the concurrent writer thread before connecting, and then call a
synchronous connect function such as connect_tcp. The synchronous
function grabs h->lock, then writes something, which eventually
invokes the writer thread which calls nbd_aio_get_fd and deadlocks on
h->lock.
-> Probably the writer thread should be forbidden from using
nbd_handle.
(2) The writer thread calls nbd_aio_get_fd, but the fd returned might
might be closed before we use it, resulting in either EBADF or worse
using another fd that happens to be opened around the same time.
-> I think the solution to this would be to allow the writer callback
to signal that the socket is about to be closed (eg. add an extra flag
parameter to the callback), which would kill the writer thread.
(3) nbd_concurrent_writer_error could lose errors. This might happen
if the socket is closed normally without writing anything, which would
never check h->writer_error.
(4) nbd_concurrent_writer_error possibly deadlocks too since it needs
to grab h->lock. Basically the same as (1).
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines. Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v