On 3/27/20 6:28 PM, Richard W.M. Jones wrote:
>
> But I can at least reuse the same mechanism we have for waking up the
> poll() loop when first sending a command to the server. That is,
> since we already have a pipe-to-self in addition to reading from the
> server, it's trivial to argue that closing the pipe-to-self will
> guarantee that the reader thread sees something interesting to break
> out of its poll() loop, regardless of whether it also sees something
> interesting from the server after having sent NBD_CMD_DISC, and
> regardless of whether I need to add in more gnutls_bye() calls to
> either nbdkit or libnbd.
>
> Fixes: ab7760fc
> Signed-off-by: Eric Blake <eblake(a)redhat.com>
> ---
>
> May be incomplete: I might also need to break out of the reader loop
> when read() returns 0.
After some soak time, I was able to reproduce the hangs (after reverting
my two commits that reordered testsuite cleanup) fairly reliably by
running test-nbd-tls{,-psk}.sh in a loop 100 times without this patch,
and could not reproduce it with this patch. But as mentioned on the
other thread, I also finally saw what the real problem was (often,
things look so much simpler in hindsight!) - calling nbd_shutdown is
synchronous and results in two threads competing on poll() on the same
fd, which is never a good idea. Switching to nbd_aio_disconnect fixes
the competition, and also passed my stress-test of 100 cycles without
hitting the hang, so v2 of this patch will be along those lines instead.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization:
qemu.org |
libvirt.org