On 8/27/19 6:47 AM, Richard W.M. Jones wrote:
On Fri, Aug 02, 2019 at 02:26:15PM -0500, Eric Blake wrote:
> + /* Ensure that stdin/out/err of the current process were not empty
> + * before we started creating pipes (otherwise, the close and dup2
> + * calls below become more complex to juggle fds around correctly).
> + */
> + assert (in_fd[0] > STDERR_FILENO && in_fd[1] > STDERR_FILENO
&&
> + out_fd[0] > STDERR_FILENO && out_fd[1] > STDERR_FILENO
&&
> + err_fd[0] > STDERR_FILENO && err_fd[1] > STDERR_FILENO);
This assert is now being thrown whenever we use nbdkit + sh plugin +
nbdkit -s option. It's easy to demonstrate using a libnbd one-liner:
$ nbdsh -c 'h.connect_command (["nbdkit", "sh",
"/tmp/min.sh", "-s", "--exit-with-parent"])' \
-c 'print ("%r" % h.get_size())'
1048576
nbdkit: call.c:155: call3: Assertion `in_fd[0] > STDERR_FILENO && in_fd[1]
> STDERR_FILENO && out_fd[0] > STDERR_FILENO && out_fd[1] >
STDERR_FILENO && err_fd[0] > STDERR_FILENO && err_fd[1] >
STDERR_FILENO' failed.
(This is triggered frequently by tests in the libnbd test suite which
is where I first observed it.)
It happens during the shutdown path where there is one final call to
sh_close in the plugin:
Hmm, so we've closed stdin/out because the client connection is no
longer around, but still give the shell script one more callback. Yeah,
the failure is definitely explainable, and worth fixing.
We can easily paper over this by ignoring the assert on the close
path, but I foresee two problems:
(1) I don't believe the assert is generally correct. While it's not
ideal for nbdkit to be run with stdin/out/err closed (they obviously
should be connected to /dev/null) it's also not impossible for that to
happen. We don't cope well with this situation.
(2) The assert is protecting us from some minimal checks in this code,
but I think the right answer is to just add those checks.
It's also possible to open /dev/null in a loop until getting one larger
than STDERR_FILENO (so you'd have up to four open() calls), then do
pipe() with the assurance of no collisions, then close the scratch fds.
But that's probably more syscall overhead and not worth doing, so your
attempt at doing the fd shuffling correctly is better.
Anyway, a possible patch for this is attached, but I'm not very
confident it is correct.
Not quite. I don't think POSIX guarantees that fd[0] < fd[1] when
pipe() succeeds. But in the situation we're hitting, fd 0 and 1 are
closed when we start pipe()ing, so our first pipe will claim 0 and 1 in
either order, then the first check will either be a no-op (because fd[0]
is 0 and doesn't need moving) or else fd[0] will be 1 and needs to
overwrite stdin. Overwriting is fine; but if fd[0] is 0, we HAVE to
clear FD_CLOEXEC, otherwise stdin will be closed by exec().
But it's close; so I'll use it as the starting point and push the
corrected version soon.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization:
qemu.org |
libvirt.org