In tests/test-eval.sh we have a test which looks something like this:
nbdkit eval close='echo closed > file' --run 'qemu-img info $nbd'
if ! grep 'closed' file; then fail; fi
However there was a race condition here. nbdkit exits when the --run
command exits without waiting for the captive nbdkit subprocess. Thus
we couldn't be sure that the final 'closed' message got written to the
file. It worked most of the time, but on slow machines the test
failed.
This indicates that we ought to wait for the captive nbdkit to exit.
One reason is so that plugin cleanup is done before we continue after
the captive nbdkit. That should make shell scripts using captive
nbdkit + any plugin that does significant cleanup on exit more
predictable.
This adds the appropriate call to waitpid but we still ignore the exit
status of the captive nbdkit subprocess (in fact it is most likely to
fail since it will sent a SIGTERM signal).
---
server/captive.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/server/captive.c b/server/captive.c
index 102a77ea..0a243d2b 100644
--- a/server/captive.c
+++ b/server/captive.c
@@ -171,12 +171,14 @@ run_command (void)
r = EXIT_FAILURE;
break;
case 0:
- /* Captive nbdkit still running; kill it, but no need to wait
- * for it, as the --run program's exit status is good enough (if
- * the captive nbdkit fails to exit after SIGTERM, we have a
- * bigger bug to fix).
+ /* Captive nbdkit is still running; kill it. We want to wait
+ * for nbdkit to exit since that ensures all cleanup is done in
+ * the plugin before we return. However we don't care if nbdkit
+ * returns an error, the exit code we return always comes from
+ * the --run command.
*/
kill (pid, SIGTERM);
+ waitpid (pid, NULL, 0);
break;
default:
/* Captive nbdkit exited unexpectedly; update the exit status. */
--
2.25.0