On 5/29/19 10:13 PM, Eric Blake wrote:
Benchmark-wise, using the same setup as in commit e897ed70, I see
an order-of-magnitude slowdown:
Pre-patch, the runs averaged 1.266s, 1.30E+08 bits/s
Post-patch, the runs averaged 11.663s, 1.41E+07 bits/s
This will need further profiling to determine how much is nbdkit's
fault, and how much is libnbd's. I think that we are probably holding
some locks for too long, resulting in some serialized performance.
Also, the standalone code was able to run read of command 1 in
parallel with write of command 2 via separate threads, whereas
libnbd's state machine is serializing everything (whether or not the
state machine spreads out the I/O to do writes from the thread calling
nbd_aio_FOO and reads from the reader thread, the two are never run at
once).
Rich identified and fixed the culprit - libnbd was not setting
TCP_NODELAY (disabling Nagle's algorithm) the way nbd-standalone.c did,
which meant that any request that gets split over TCP windowing sizes
waits to send the second packet until the ACK for the first has been
received by the server. While disabling Nagle's increases network
overhead (you are sending more short packets with their overhead rather
than bundling things into larger packets by virtue of the downtime
waiting for the ACK), it is an artificial delay (the server can't
process anything until the packets arrive). libnbd 0.1.2 now disables
Nagle's algorithm, and my repeat of the tests with that change showed an
improvement to 2.180s, which is more eassily explained by the
serialization nature that libnbd is never read()ing from one thread at
the same time another thread is write()ing.
I'd have to intentionally cripple nbd-standalone.c to do that same
serialization, to see if libnbd actually offers an overall performance
gain when I/O is serialized (may not be as easy as it sounds; I was
relying on blocking I/O from separate threads, but merely serializing
that so that no thread is doing I/O concurrent with another risks
deadlock in the case of a client sending a large NBD_CMD_READ followed
by NBD_CMD_WRITE to a server that responds strictly serially, where the
server is waiting for the client to finish the read's response but the
client is waiting for the server to parse the write's request; libnbd
works around that by using non-blocking sockets and prioritizing reads
over writes rather than insisting on complete transactions).
-/* Connect to a TCP socket, returning the fd on success */
-static int
-nbdplug_connect_tcp (void)
-{
- struct addrinfo hints = { .ai_family = AF_UNSPEC,
- .ai_socktype = SOCK_STREAM, };
- struct addrinfo *result, *rp;
- int r;
- const int optval = 1;
- int fd;
-
-
- if (setsockopt (fd, IPPROTO_TCP, TCP_NODELAY, &optval,
- sizeof (int)) == -1) {
- nbdkit_error ("cannot set TCP_NODELAY option: %m");
- close (fd);
- return -1;
- }
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization:
qemu.org |
libvirt.org