On 7/25/19 1:28 PM, Eric Blake wrote:
Note that this patch is trying to speed up a slow waitpid() issued
_after_ we call h->sock->ops->close(). You'd think that after we
close() our end, that the remote end doing a read() would see an
immediate EOF, and that would be enough for the server to shut down
quickly instead of waiting for the delay to expire and finally notice
during its write() that we are no longer reading. However, POSIX says
that when sockets are involved, a close() on our end does not trigger an
immediate EOF on the other side's read end, but rather may start an
asynchronous timer to try to finish the TCP connection gracefully to
flush any pending traffic, where both directions have to agree that the
connection is dead before either direction sees EOF/EPIPE. And that's
where the shutdown(2) API comes in handy: it forces the kernel to
realize that one or both directions of traffic are no longer needed,
making this asynchronous handling in close() much faster. At least
nbdkit is known to react favorably to a shutdown() request - see nbdkit
commit 430f8141.
We may want to add h->sock->ops->shutdown() (especially if we want to
handle SHUT_WR through adding more states for more graceful shutdown
paths, including when TLS is involved), but even in the short-term,
something as simple as adding shutdown(h->fd, SHUT_RDWR) immediately
before the s->sock->ops->close() would be helpful.
That effect of wanting to use shutdown() is still real, but may not be
the only thing at play here. As we discussed on IRC, adding shutdown()
alone was not enough to make this specific nbdkit invocation get to
exit() any faster; the waitpid() was still stalling without a kill().
So we've identified a second issue: nbdkit is refusing to exit() until
the plugin and all filters unload, but the filter cannot unload until it
has no pending transactions, but the pending transaction is stuck in a
"non-interruptible" usleep() (since all the signal handlers installed by
nbdkit use SA_RESTART, the usleep() doesn't exit early due to EINTR; it
takes the external SIGHUP to force nbdkit to die without waiting for the
filter to clean up). So the thought here is that maybe nbdkit should be
patched to add an nbdkit_usleep() function, which returns 0 if the full
sleep happened, or returns -1 if the sleep is interrupted early in one
thread due to another thread noticing EOF on the read()s from the
client. The delay filter may be the only code to make use of
nbdkit_usleep, but having an easy way to recognize that the client has
gone away and thus waiting out the rest of the sleep is pointless will
make nbdkit much more responsive to a client close()ing the connection.
I'll work on the nbdkit patch.
Finally, calling kill() unconditionally in nbd_close() may or may not be
the nicest, so we have the idea of adding a new API (maybe named
nbd_kill) which the user can request which signal to send to the
underlying command pid, and control how long it waits before trying a
second signal attempt, all prior to calling nbd_close(). Rich will
tackle something along those lines.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization:
qemu.org |
libvirt.org