Re: [Libguestfs] [PATCH libnbd v2 3/4] api: Implement concurrent writer.

Wednesday, 3 July 2019

On 6/4/19 6:16 AM, Richard W.M. Jones wrote:
...
 There are several races / deadlocks which I've thought about. 
Let's
 see if I can remember them all ... 
More thoughts on this, now that we've solved some other issues in the
meantime.

...

 (1) This I experienced: nbd_aio_get_fd deadlocks if there are
 concurrent synchronous APIs going on.  A typical case is where you set
 up the concurrent writer thread before connecting, and then call a
 synchronous connect function such as connect_tcp.  The synchronous
 function grabs h->lock, then writes something, which eventually
 invokes the writer thread which calls nbd_aio_get_fd and deadlocks on
 h->lock.

 -> Probably the writer thread should be forbidden from using
 nbd_handle. 
Still true (although we may decide to allow calling nbd_ functions that
don't require the lock, with proper documentation of which functions
that includes).  Is it reasonable to make aio_get_fd lockless?

...

 (2) The writer thread calls nbd_aio_get_fd, but the fd returned might
 might be closed before we use it, resulting in either EBADF or worse
 using another fd that happens to be opened around the same time.

 -> I think the solution to this would be to allow the writer callback
 to signal that the socket is about to be closed (eg. add an extra flag
 parameter to the callback), which would kill the writer thread. 
The extra flag is still a good idea. Also, would it help to dup() the
fd, so that even if the reader thread ends up closing the fd, the writer
thread still has the dup'd copy open?  But it also emphasizes the need
for an nbd_aio_notify_error - if the reader thread encounters an error,
we need to inform the writer thread to quit polling; conversely, if the
writer thread encounters an error, it needs to affect the state machine
so that the reader thread will quit polling.

...

 (3) nbd_concurrent_writer_error could lose errors.  This might happen
 if the socket is closed normally without writing anything, which would
 never check h->writer_error. 
We may need some more states in the state machine to perform graceful
shutdown. As it is, I've already added h->disconnect_request; that may
figure into deciding whether the server going away was early or as a
result of our request.

...

 (4) nbd_concurrent_writer_error possibly deadlocks too since it needs
 to grab h->lock.  Basically the same as (1). 
Unless we make it lockless, to merely set a flag that later attempts to
call nbd_aio_notify_read or nbd_aio_FOO then notice early enough in
their advancement of the state machine.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] [PATCH libnbd v2 3/4] api: Implement concurrent writer.