On 6/6/19 5:48 PM, Eric Blake wrote:
As mentioned in the previous patch, sending small packets as soon as
possible leads to network packet overhead. Grouping related writes
under corking appears to help everything but unencrypted Unix plain
sockets. I tested with appropriate combinations from:
$ nbdkit {-p 10810,-U -} \
{--tls=require --tls-verify-peer --tls-psk=./keys.psk,} memory size=64m \
--run './aio-parallel-load{-tls,} {$unixsocket,nbd://localhost:10810}'
with the following IOPS measurements:
pre post
unix plain: 81003.5 78538.8
unix tls: 11678.0 12256.1
tcp plain: 65702.6 68010.9
tcp tls: 11058.5 11762.0
I intentionally did not bother with using corking during handshake
phase - that part of the protocol may benefit, but it is not the hot
spot in overall execution.
Signed-off-by: Eric Blake <eblake(a)redhat.com>
---
I'm not sure why Unix sockets degraded...
I also probably out to repeat my timings more than a single run...
I've rerun things 10 times and tossed out the first run (using it to
warm up the cache); this time my numbers were:
pre post gain: avg worst best
unix plain: 809534.6 812175.4 0.3% -0.6% 1.9%
unix tls: 124926.4 122546.2 -1.9% -4.6% 1.4%
tcp plain: 663690.0 679575.8 2.4% -0.4% 4.1%
tcp tls: 114133.0 117745.4 3.2% -2.3% 7.9%
So, the numbers vary, and there's quite a bit of noise (perhaps from the
random number sequence). In general, it seems that TCP benefits more
(the effect may be even more pronounced across the network, rather than
when using loopback on localhost). I confirmed that for plain Unix
sockets, the new cork code is NOT called (so the numbers should be in
the same ballback, giving us an estimate of how much noise the other
columns have).
At any rate, I'm leaning towards declaring that the idea makes sense
even if it is nowhere near as dramatic as the Nagle's algorithm speedups.
---
server/connections.c | 2 ++
server/protocol.c | 52 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 54 insertions(+)
diff --git a/server/connections.c b/server/connections.c
index 9b0b75c..8c92dc5 100644
--- a/server/connections.c
+++ b/server/connections.c
@@ -356,6 +356,7 @@ raw_send (struct connection *conn, const void *vbuf, size_t len)
return 0;
}
+#ifdef TCP_CORK
/* Change the cork status to batch a group of send calls, and either succeed
* completely (returns 0) or fail (returns -1).
*/
@@ -368,6 +369,7 @@ raw_cork (struct connection *conn, bool cork)
setsockopt (conn->sockout, IPPROTO_TCP, TCP_CORK, &opt, sizeof opt);
return 0;
}
+#endif /* TCP_CORK */
Whoops - this part belongs in patch 1.
/* Read buffer from conn->sockin and either succeed completely
* (returns > 0), read an EOF (returns 0), or fail (returns -1).
diff --git a/server/protocol.c b/server/protocol.c
index 6d519e7..0057aed 100644
--- a/server/protocol.c
+++ b/server/protocol.c
@@ -398,6 +398,11 @@ send_simple_reply (struct connection *conn,
reply.handle = handle;
reply.error = htobe32 (nbd_errno (error, false));
+ if (conn->cork) {
+ r = conn->cork (conn, true);
+ assert (r == 0); /* For now, only uncorking can fail */
+ }
One possibility might be to always install a conn->cork (and write a
no-op version for Unix/-s use), rather than checking if the cork
callback is NULL (although my gut feel is a null check beats a no-op
function indirection).
+
r = conn->send (conn, &reply, sizeof reply);
if (r == -1) {
nbdkit_error ("write reply: %s: %m", name_of_nbd_cmd (cmd));
@@ -413,6 +418,14 @@ send_simple_reply (struct connection *conn,
}
}
+ if (conn->cork) {
+ r = conn->cork (conn, false);
+ if (r == -1) {
+ nbdkit_error ("write uncork: %s: %m", name_of_nbd_cmd (cmd));
+ return connection_set_status (conn, -1);
+ }
+ }
+
I probably also ought to document in the commit message that on any
failure, I left the socket in corked mode - but I assumed that's okay,
since once we fail, we're not sending anything else anyway. On the
other hand, I just re-read docs: for TCP sockets, the Linux man page
says a cork lasts at most 200ms, then auto-uncorks; but for gnutls, if
you don't manually uncork, we could strand data in the gnutls buffers,
and thereby possibly interfere with gnutls_bye, so maybe I should stick
an uncork in connection_set_status() to catch all error paths. Still,
it's probably nicer to have a generic uncork in the cleanup paths rather
than worrying about a compiler-assisted attribute cleanup function to
uncork on all early exit paths.
For libnbd as it currently stands, only NBD_OPT (where I argued that
handshake is not the hot path) and NBD_CMD_WRITE would benefit from cork
code (but the benefit for writes may still be worth it)
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization:
qemu.org |
libvirt.org