On Thu, Mar 26, 2020 at 02:34:41PM -0500, Eric Blake wrote:
We're still seeing sporadic failures of 'nbdkit nbd
tls=', and I'm
still trying to come up with a root cause fix (it may involve smarter
use of gnutls_bye() in libnbd). In the meantime, here's what we know:
when the hang/failure happens, the 'nbdkit nbd tls=' client process is
stuck in a poll() waiting to see EOF from the server, while the
'nbdkit example1' server process is stuck in a read waiting to see if
the client will do a clean shutdown of the gnutls session. Sending
SIGTERM to the client is not going to break the poll, but if we
instead kill the server, that will cause the client to respond
(perhaps with an error message that we ignore, but better than
hanging).
So, by rearranging the order in which we call our start_nbdkit
function, we change the order in which we send SIGTERM to the two
processes. And in turn, this becomes the first testsuite coverage of
the 'nbd retry=' parameter, added back in commit 0bb76bc7.
Signed-off-by: Eric Blake <eblake(a)redhat.com>
---
My current setup does not seem to be hitting the testsuite
hang/failure as frequently as Rich's setup, so for now I'm posting
this in the hopes that we can see if it reduces the rate of testsuite
failures.
tests/test-nbd-tls-psk.sh | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/tests/test-nbd-tls-psk.sh b/tests/test-nbd-tls-psk.sh
index 7a477da9..547064ab 100755
--- a/tests/test-nbd-tls-psk.sh
+++ b/tests/test-nbd-tls-psk.sh
@@ -1,6 +1,6 @@
#!/usr/bin/env bash
# nbdkit
-# Copyright (C) 2019 Red Hat Inc.
+# Copyright (C) 2019-2020 Red Hat Inc.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
@@ -64,14 +64,14 @@ files="$sock1 $sock2 $pid1 $pid2 nbd-tls-psk.out"
rm -f $files
cleanup_fn rm -f $files
+# Run nbd plugin as intermediary; also test our retry code
+start_nbdkit -P "$pid2" -U "$sock2" --tls=off nbd retry=5 \
+ tls=require tls-psk=keys.psk tls-username=qemu socket="$sock1"
+
# Run encrypted server
start_nbdkit -P "$pid1" -U "$sock1" \
--tls=require --tls-psk=keys.psk example1
-# Run nbd plugin as intermediary
-start_nbdkit -P "$pid2" -U "$sock2" --tls=off \
- nbd tls=require tls-psk=keys.psk tls-username=qemu socket="$sock1"
-
# Run unencrypted client
qemu-img info --output=json -f raw "nbd+unix:///?socket=$sock2" >
nbd-tls-psk.out
--
It's worth a try, so ACK.
It might be nice to add a comment into the test so we know why it's
running the nbdkit instances in an unexpected order.
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW