On 6/4/19 6:16 AM, Richard W.M. Jones wrote:
There are several races / deadlocks which I've thought about.
Let's
see if I can remember them all ...
(1) This I experienced: nbd_aio_get_fd deadlocks if there are
concurrent synchronous APIs going on. A typical case is where you set
up the concurrent writer thread before connecting, and then call a
synchronous connect function such as connect_tcp. The synchronous
function grabs h->lock, then writes something, which eventually
invokes the writer thread which calls nbd_aio_get_fd and deadlocks on
h->lock.
-> Probably the writer thread should be forbidden from using
nbd_handle.
Or rather, forbidden from using nbd_handle in a locking manner. We
should basically document a whitelist of functions that are safe to call...
(2) The writer thread calls nbd_aio_get_fd, but the fd returned might
might be closed before we use it, resulting in either EBADF or worse
using another fd that happens to be opened around the same time.
-> I think the solution to this would be to allow the writer callback
to signal that the socket is about to be closed (eg. add an extra flag
parameter to the callback), which would kill the writer thread.
Yes, I think a flag argument to the writer callback can serve a couple
of additional purposes:
Right now, the writer callback is reached exactly once for most commands
(on data from h->request), and exactly twice for NBD_CMD_WRITE
(h->request, then the user's persistent buffer). A flag argument would
let us inform the user whether the buffer is transient (h->request is
volatile; the writer thread MUST memcpy it aside; but it is fixed in
length so the memcpy is not performance sensitive) or persistent (the
NBD_CMD_WRITE buffer was provided by the caller, and may be several
megabytes; memcpy() can negatively impact the cache and is pointless
since the original buffer is not going to be scribbled on); it also lets
us inform the writer thread on whether TCP_CORK would be worthwhile
(corking the outgoing socket for NBD_CMD_WRITE would let the OS buffer
things into a single wire transaction on both the request header and the
payload, rather than more overhead for two separate TCP packets,
particularly since we've disabled Nagle's algorithm).
Or, in pseudo-code:
nbd_aio_pread(,userbuf,) - >
write_callback(data, h->request, len, 0);
nbd_aio_pwrite(,userbuf,) - >
write_callback(data, h->request, len, LIBNBD_WRITE_CORK)
write_callback(data, userbuf, len, LIBNBD_WRITE_PERSISTENT_BUF)
nbd_aio_shutdown() - >
write_callback(data, NULL, 0, LIBNBD_WRITE_SHUTDOWN)
(3) nbd_concurrent_writer_error could lose errors. This might happen
if the socket is closed normally without writing anything, which would
never check h->writer_error.
(4) nbd_concurrent_writer_error possibly deadlocks too since it needs
to grab h->lock. Basically the same as (1).
...and this function must be in the documented safe whitelist, but that
means that instead of grabbing h->lock, we have to implement error
reporting by use of atomics.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization:
qemu.org |
libvirt.org