On Tue, Nov 13, 2018 at 10:42:20AM +0000, Richard W.M. Jones wrote:
On Tue, Nov 13, 2018 at 10:04:33AM +0000, Daniel P. Berrangé wrote:
> On Fri, Nov 09, 2018 at 12:19:30PM +0000, Richard W.M. Jones wrote:
> > Peter Dimitrov and myself were debugging a very peculiar bug when
> > libguestfs is run as a plugin from collectd:
> >
> >
https://www.redhat.com/archives/libguestfs/2018-November/thread.html#00023
> >
> > The long story short is that collectd leaks SIGCHLD == SIG_IGN setting
> > into plugins:
> >
> >
https://www.redhat.com/archives/libguestfs/2018-November/msg00095.html
> >
> > This means that any plugin that does the usual pattern of:
> >
> > pid = fork ();
> > ...
> > if (waitpid (pid, NULL, 0) == -1) {
> > perror ("waitpid");
> > exit (EXIT_FAILURE);
> > }
> >
> > will fail, because the forked subprocess is automatically reaped
> > before waitpid is called, resulting in the wait failing with errno ==
> > ECHILD.
> >
> > It is possible to work around this by adding:
> >
> > signal (SIGCHLD, SIG_DFL);
> >
> > to the plugin. However I believe this is a bug in collectd, and it
> > should sanitize signals (and maybe other things) before running
> > plugins.
>
> Yes, I'm inclined to agree. If an app or library is spawning external
> processes it should take care to restore signal setup to a "normal"
state.
> This means not only setting SIG_DFL for all signals, but also just as
> importantly, the signal mask. It should be set to all ones before calling
> fork, and set back to all zeroes immedaitely before execve, as illustrated
> in libvirt to avoid races:
>
>
https://libvirt.org/git/?p=libvirt.git;a=blob;f=src/util/vircommand.c;h=d...
>
>
https://libvirt.org/git/?p=libvirt.git;a=blob;f=src/util/vircommand.c;h=d...
This is absolutely right. What's interesting and new here is that
collectd is actually using dlopen to call external plugins (so not
forking itself). See collectd.git/src/daemon/plugin.c
However it's still not sanitizing the environment sufficiently around
these calls, so it's the same sort of problem, but in a different
context that I'd not seen/understood before.
Hmm, in the dlopen() plugin case I think it is less clear cut.
I'm fuzzy on the collectd architecture, but if it is multi-threaded,
then it is not practical to set SIGCHLD to a particular value around
the plugin call, because signal handlers are process wide, not per
thread.
So from that POV, I think the plugin would have to be written to cope
with whatever setup it receives. If you really need SIGCHLD to work,
then the plugin would need to immediately fork and have the parent do
a blocking waitpid. Then the real work can be done in the fork'd child,
which can safely reset SIGCLD and spawn the real external processes as
needed.
Regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|