When running captive, we were blindly calling kill(pid) of the captive
nbdkit child process and ignoring failures, even if that process has
already exited unexpectedly (most likely, from an assertion failure).
Thankfully, because we did not wait on the process, we are guaranteed
that the nbdkit child process is either still running or in zombie
state, so no other process can recycle the pid yet.
Merely sending SIGTERM should be enough to cause a properly-working
nbdkit to exit with status 0. We could make the code more complex to
sleep for a bit and follow up with SIGKILL, but it's not worth
blocking the code right now (and if we do exit before the child, the
child becomes an orphan process auto-reaped by init).
While at it, fix the fact that system() can fail with a value that is
not appropriate to hand to WIFEXITED() if the child was not even
spawned, but cannot fail with WIFSTOPPED. Prefer EXIT_FAILURE over
hard-coded 1. Use exit() instead of _exit() to allow I/O to flush and
any atexit handlers to run. Also, reflect death from signal to a
status > 128 rather than 1 (we could be even fancier and also re-raise
the signal so that we die from the same thing, but again, it's not
obvious we need that much work...).
Signed-off-by: Eric Blake <eblake(a)redhat.com>
---
server/captive.c | 47 ++++++++++++++++++++++++++++++++++++-----------
1 file changed, 36 insertions(+), 11 deletions(-)
diff --git a/server/captive.c b/server/captive.c
index c4cec238..4bb738fc 100644
--- a/server/captive.c
+++ b/server/captive.c
@@ -54,7 +54,7 @@ run_command (void)
FILE *fp;
char *cmd = NULL;
size_t len = 0;
- int r;
+ int r, status;
pid_t pid;
if (!run)
@@ -149,22 +149,47 @@ run_command (void)
if (pid > 0) { /* Parent process is the run command. */
r = system (cmd);
- if (WIFEXITED (r))
+ if (r == -1) {
+ nbdkit_error ("failure to execute external command: %m");
+ r = EXIT_FAILURE;
+ }
+ else if (WIFEXITED (r))
r = WEXITSTATUS (r);
- else if (WIFSIGNALED (r)) {
+ else {
+ assert (WIFSIGNALED (r));
fprintf (stderr, "%s: external command was killed by signal %d\n",
program_name, WTERMSIG (r));
- r = 1;
- }
- else if (WIFSTOPPED (r)) {
- fprintf (stderr, "%s: external command was stopped by signal %d\n",
- program_name, WSTOPSIG (r));
- r = 1;
+ r = WTERMSIG (r) + 128;
}
- kill (pid, SIGTERM); /* Kill captive nbdkit. */
+ switch (waitpid (pid, &status, WNOHANG)) {
+ case -1:
+ nbdkit_error ("waitpid: %m");
+ r = EXIT_FAILURE;
+ break;
+ case 0:
+ /* Captive nbdkit still running; kill it, but no need to wait
+ * for it, as the captive program's exit status is good enough
+ * (if the captive fails to exit after SIGTERM, we have a bigger
+ * bug to fix).
+ */
+ kill (pid, SIGTERM);
+ break;
+ default:
+ /* Captive nbdkit exited unexpectedly; update the exit status. */
+ if (WIFEXITED (status)) {
+ if (r == 0)
+ r = WEXITSTATUS (status);
+ }
+ else {
+ assert (WIFSIGNALED (status));
+ fprintf (stderr, "%s: nbdkit command was killed by signal %d\n",
+ program_name, WTERMSIG (status));
+ r = WTERMSIG (status) + 128;
+ }
+ }
- _exit (r);
+ exit (r);
}
free (cmd);
--
2.21.0