Checking for poll(POLLIN) only wakes us up when the server sends more
bytes over the wire. But the way gnutls is implemented, it reads as
many non-blocking bytes as possible from the wire before then parsing
where the encoded message boundaries lie within those bytes. As a
result, we may already have the server's next reply in memory without
needing to block on the server sending yet more bytes (especially if
the server was employing tricks like corking to batch things up for
fewer packets over the network).
If we are using synchronous API, this is not a problem (we never have
more than one request in flight at once, which implies there are no
outstanding replies from the server when we send a request, so we
always get the needed POLLIN when the server's next reply begins); but
if we are using asynch API and have more than one message in flight,
it is conceivable that the server flushes its complete reply queue
(two or more replies) contained within a single encrypted network
packet, but where our state machine reads only the first reply and
then waits for a POLLIN kick before reading again. Fortunately, I
could not think of a scenario where this would turn into direct
deadlock in our state machine alone (although I also haven't
definitively ruled out that possibility) - in the common case, the
only way the server can send more than one reply at once is if it
supports parallel in-flight requests; which means we are safe that the
next asynch message we send will be received by the server, and its
reply to that message will give us the POLLIN kick we need to finally
process the reply that had been stuck (for a possibly long time) in
the local buffer.
But I _did_ come up with a problematic scenario - if the server's
final two replies before our NBD_CMD_DISC occur in the same packet, we
could end up with the socket closed without ever handling the second
reply, giving us data loss for that reply. It's also a potential
indirect deadlock: the NBD spec recommends that a client not call
NBD_CMD_DISC until it has received all responses to outstanding
in-flight requests, so a client that has finished its work and does
not plan to make any more aio requests other than NBD_CMD_DISC and
wants to block until all server replies are in will hang waiting for a
POLLIN that the server never sends because we already have the reply
and aren't sending more requests.
Testing shows that this patch has a slight performance improvement (to
be expected, since we are eliminating an artificial latency on
starting the REPLY sequence in the state machine) - on my machine,
running tests/aio-parallel-load-tls saw an increase from 11522.9 IOPS
pre-patch to 12558.1 IOPS post-patch.
---
I'm pushing this because of the performance improvement, but wanted
to post it because it was an interesting problem. (Sad when my writeup
is three times longer than the patch itself...)
generator/states.c | 5 +++++
lib/crypto.c | 7 +++++++
lib/internal.h | 1 +
3 files changed, 13 insertions(+)
diff --git a/generator/states.c b/generator/states.c
index bce4f85..145e8c1 100644
--- a/generator/states.c
+++ b/generator/states.c
@@ -113,6 +113,11 @@ send_from_wbuf (struct nbd_handle *h)
READY:
if (h->cmds_to_issue)
SET_NEXT_STATE (%ISSUE_COMMAND.START);
+ else {
+ assert (h->sock);
+ if (h->sock->ops->pending && h->sock->ops->pending
(h->sock))
+ SET_NEXT_STATE (%REPLY.START);
+ }
return 0;
DEAD:
diff --git a/lib/crypto.c b/lib/crypto.c
index 3f7c744..e0f173f 100644
--- a/lib/crypto.c
+++ b/lib/crypto.c
@@ -181,6 +181,12 @@ tls_send (struct nbd_handle *h,
return r;
}
+static bool
+tls_pending (struct socket *sock)
+{
+ return gnutls_record_check_pending (sock->u.tls.session) > 0;
+}
+
static int
tls_get_fd (struct socket *sock)
{
@@ -209,6 +215,7 @@ tls_close (struct socket *sock)
static struct socket_ops crypto_ops = {
.recv = tls_recv,
.send = tls_send,
+ .pending = tls_pending,
.get_fd = tls_get_fd,
.close = tls_close,
};
diff --git a/lib/internal.h b/lib/internal.h
index 5b6152b..61ddbde 100644
--- a/lib/internal.h
+++ b/lib/internal.h
@@ -192,6 +192,7 @@ struct socket_ops {
struct socket *sock, void *buf, size_t len);
ssize_t (*send) (struct nbd_handle *h,
struct socket *sock, const void *buf, size_t len);
+ bool (*pending) (struct socket *sock);
int (*get_fd) (struct socket *sock);
int (*close) (struct socket *sock);
};
--
2.20.1