On Fri, Mar 24, 2023 at 11:47:20AM +0100, Florian Westphal wrote:
qemu-nbd doesn't set TCP_NODELAY on the tcp socket.
Kernel waits for more data and avoids transmission of small packets.
Without TLS this is barely noticeable, but with TLS this really shows.
Booting a VM via qemu-nbd on localhost (with tls) takes more than
2 minutes on my system. tcpdump shows frequent wait periods, where no
packets get sent for a 40ms period.
Thank you for this analysis.
Add explicit (un)corking when processing (and responding to) requests.
"TCP_CORK, &zero" after earlier "CORK, &one" will flush
pending data.
VM Boot time:
main: no tls: 23s, with tls: 2m45s
patched: no tls: 14s, with tls: 15s
VM Boot time, qemu-nbd via network (same lan):
main: no tls: 18s, with tls: 1m50s
patched: no tls: 17s, with tls: 18s
And the timings bear proof that it matters.
Future optimization: if we could detect if there is another pending
request we could defer the uncork operation because more data would be
appended.
nbdkit and libnbd do this with the MSG_MORE flag (plaintext) and TLS
corking (tls); when building up a message to the other side, a flag is
set any time we know we are likely to send more data very shortly.
nbdkit wraps it under a flag SEND_MORE, which applies to both plaintext:
https://gitlab.com/nbdkit/nbdkit/-/blob/master/server/connections.c#L415
and to TLS connections:
https://gitlab.com/nbdkit/nbdkit/-/blob/master/server/crypto.c#L396
while libnbd uses MSG_MORE a bit more directly for the same purpose
for plaintext, but isn't (yet) doing TLS corking:
https://gitlab.com/nbdkit/libnbd/-/blob/master/generator/states-issue-com...
https://gitlab.com/nbdkit/libnbd/-/blob/master/lib/internal.h#L57
I would love to see a future patch to qio_channel code to support
MSG_MORE in the same way as nbdkit is using its SEND_MORE flag, as the
caller often has more info on whether it is sending a short prefix or
is done with a conceptual message and ready to uncork, and where the
use of a flag can be more efficient than separate passes through
cork/uncork calls. But even your initial work at properly corking is
a good step in the right direction.
And surprisingly, qemu IS using corking on the client side:
https://gitlab.com/qemu-project/qemu/-/blob/master/block/nbd.c#L525
just not on the server side, before your patch.
Signed-off-by: Florian Westphal <fw(a)strlen.de>
---
nbd/server.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/nbd/server.c b/nbd/server.c
index a4750e41880a..848836d41405 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2667,6 +2667,8 @@ static coroutine_fn void nbd_trip(void *opaque)
goto disconnect;
}
+ qio_channel_set_cork(client->ioc, true);
+
if (ret < 0) {
/* It wasn't -EIO, so, according to nbd_co_receive_request()
* semantics, we should return the error to the client. */
@@ -2692,6 +2694,7 @@ static coroutine_fn void nbd_trip(void *opaque)
goto disconnect;
}
+ qio_channel_set_cork(client->ioc, false);
Reviewed-by: Eric Blake <eblake(a)redhat.com>
done:
nbd_request_put(req);
nbd_client_put(client);
--
2.39.2
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization:
qemu.org |
libvirt.org