[PATCH nbdkit] server: When using --run, wait for captive nbdkit to exit. - Libguestfs - Libguestfs List Archives

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[PATCH nbdkit] server: When using --run, wait for captive nbdkit to exit.

[PATCH] lib: Autodetect backing...

[PATCH v2 0/4] daemon: Translate...

Richard W.M. Jones

Thursday, 27 February 2020 Thu, 27 Feb '20

9:12 a.m.

I'd like to propose we backport this to 1.18 and some earlier stable branches too. Rich.

Reply

Show replies by date

Richard W.M. Jones

Thursday, 27 February Thu, 27 Feb

9:12 a.m.

In tests/test-eval.sh we have a test which looks something like this: nbdkit eval close='echo closed > file' --run 'qemu-img info $nbd' if ! grep 'closed' file; then fail; fi However there was a race condition here. nbdkit exits when the --run command exits without waiting for the captive nbdkit subprocess. Thus we couldn't be sure that the final 'closed' message got written to the file. It worked most of the time, but on slow machines the test failed. This indicates that we ought to wait for the captive nbdkit to exit. One reason is so that plugin cleanup is done before we continue after the captive nbdkit. That should make shell scripts using captive nbdkit + any plugin that does significant cleanup on exit more predictable. This adds the appropriate call to waitpid but we still ignore the exit status of the captive nbdkit subprocess (in fact it is most likely to fail since it will sent a SIGTERM signal). --- server/captive.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/server/captive.c b/server/captive.c index 102a77ea..0a243d2b 100644 --- a/server/captive.c +++ b/server/captive.c @@ -171,12 +171,14 @@ run_command (void) r = EXIT_FAILURE; break; case 0: - /* Captive nbdkit still running; kill it, but no need to wait - * for it, as the --run program's exit status is good enough (if - * the captive nbdkit fails to exit after SIGTERM, we have a - * bigger bug to fix). + /* Captive nbdkit is still running; kill it. We want to wait + * for nbdkit to exit since that ensures all cleanup is done in + * the plugin before we return. However we don't care if nbdkit + * returns an error, the exit code we return always comes from + * the --run command. */ kill (pid, SIGTERM); + waitpid (pid, NULL, 0); break; default: /* Captive nbdkit exited unexpectedly; update the exit status. */ -- 2.25.0

Reply

Eric Blake

9:39 a.m.

On 2/27/20 9:12 AM, Richard W.M. Jones wrote:

In tests/test-eval.sh we have a test which looks something like this: nbdkit eval close='echo closed > file' --run 'qemu-img info $nbd' if ! grep 'closed' file; then fail; fi However there was a race condition here. nbdkit exits when the --run command exits without waiting for the captive nbdkit subprocess. Thus we couldn't be sure that the final 'closed' message got written to the file. It worked most of the time, but on slow machines the test failed. This indicates that we ought to wait for the captive nbdkit to exit. One reason is so that plugin cleanup is done before we continue after the captive nbdkit. That should make shell scripts using captive nbdkit + any plugin that does significant cleanup on exit more predictable. This adds the appropriate call to waitpid but we still ignore the exit status of the captive nbdkit subprocess (in fact it is most likely to fail since it will sent a SIGTERM signal). --- server/captive.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)

ACK. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org

Reply

Richard W.M. Jones

Saturday, 7 March Sat, 7 Mar

5:18 a.m.

On Thu, Feb 27, 2020 at 03:12:11PM +0000, Richard W.M. Jones wrote:

In tests/test-eval.sh we have a test which looks something like this: nbdkit eval close='echo closed > file' --run 'qemu-img info $nbd' if ! grep 'closed' file; then fail; fi However there was a race condition here. nbdkit exits when the --run command exits without waiting for the captive nbdkit subprocess. Thus we couldn't be sure that the final 'closed' message got written to the file. It worked most of the time, but on slow machines the test failed. This indicates that we ought to wait for the captive nbdkit to exit. One reason is so that plugin cleanup is done before we continue after the captive nbdkit. That should make shell scripts using captive nbdkit + any plugin that does significant cleanup on exit more predictable. This adds the appropriate call to waitpid but we still ignore the exit status of the captive nbdkit subprocess (in fact it is most likely to fail since it will sent a SIGTERM signal). --- server/captive.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/server/captive.c b/server/captive.c index 102a77ea..0a243d2b 100644 --- a/server/captive.c +++ b/server/captive.c @@ -171,12 +171,14 @@ run_command (void) r = EXIT_FAILURE; break; case 0: - /* Captive nbdkit still running; kill it, but no need to wait - * for it, as the --run program's exit status is good enough (if - * the captive nbdkit fails to exit after SIGTERM, we have a - * bigger bug to fix). + /* Captive nbdkit is still running; kill it. We want to wait + * for nbdkit to exit since that ensures all cleanup is done in + * the plugin before we return. However we don't care if nbdkit + * returns an error, the exit code we return always comes from + * the --run command. */ kill (pid, SIGTERM); + waitpid (pid, NULL, 0); break; default: /* Captive nbdkit exited unexpectedly; update the exit status. */

I *thought* we'd fixed this, but it failed on a recent ppc64le build (of 1.19.2, so including this patch). At the moment I can't think of a plausible reason for why this is still failing. https://github.com/libguestfs/nbdkit/blob/be44e94a7d0494ba562cb4c5c08bc58... Rich. FAIL: test-eval.sh ================== + requires qemu-img --version + files='eval.out eval.missing' + rm -f eval.out eval.missing + cleanup_fn rm -f eval.out eval.missing + _cleanup_hook[${#_cleanup_hook[@]}]='rm -f eval.out eval.missing' + nbdkit -U - eval 'get_size=echo 64M' 'pread=dd if=/dev/zero count=$3 iflag=count_bytes' 'missing=echo "in missing: $@" >> eval.missing; exit 2' unload= --run 'qemu-img info $nbd' + cat eval.out image: nbd+unix://?socket=/tmp/nbdkitExyrix/socket file format: raw virtual size: 64 MiB (67108864 bytes) disk size: unavailable + grep '67108864 bytes' eval.out virtual size: 64 MiB (67108864 bytes) + cat eval.missing in missing: config_complete in missing: thread_model in missing: get_ready in missing: thread_model in missing: preconnect false in missing: open false in missing: can_write in missing: can_flush in missing: is_rotational in missing: can_multi_conn in missing: can_cache in missing: can_extents + grep 'in missing' eval.missing in missing: config_complete in missing: thread_model in missing: get_ready in missing: thread_model in missing: preconnect false in missing: open false in missing: can_write in missing: can_flush in missing: is_rotational in missing: can_multi_conn in missing: can_cache in missing: can_extents + grep 'in missing: config_complete' eval.missing in missing: config_complete + grep 'in missing: thread_model' eval.missing in missing: thread_model in missing: thread_model + grep 'in missing: can_write' eval.missing in missing: can_write + grep 'in missing: is_rotational' eval.missing in missing: is_rotational + grep 'in missing: get_ready' eval.missing in missing: get_ready + grep 'in missing: preconnect' eval.missing in missing: preconnect false + grep 'in missing: open' eval.missing in missing: open false + grep 'in missing: close' eval.missing ++ _run_cleanup_hooks ++ status=1 ++ set +e ++ trap '' INT QUIT TERM EXIT ERR ++ echo ./test-eval.sh: run cleanup hooks: exit code 1 ./test-eval.sh: run cleanup hooks: exit code 1 ++ (( i = 0 )) ++ (( i < 1 )) ++ rm -f eval.out eval.missing ++ (( ++i )) ++ (( i < 1 )) ++ exit 1 FAIL test-eval.sh (exit status: 1) -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org

Reply

1959

days inactive

1968

days old

guestfs@lists.libguestfs.org

Manage subscription

3 comments

2 participants

tags (0)

participants (2)

Eric Blake
Richard W.M. Jones