Re: [Libguestfs] [libnbd PATCH v3 01/18] api: Fix crashes on nbd_connect_command with bad argv

Tuesday, 27 September 2022

On Tue, Sep 27, 2022 at 12:14:28PM +0100, Richard W.M. Jones wrote:
...
 On Mon, Sep 26, 2022 at 05:05:43PM -0500, Eric Blake wrote:
 > nbd_connect_command (h, (char **) { NULL }) triggers SIGABRT when
 > preparing to exec a NULL command name (during
 > enter_STATE_CONNECT_COMMAND_START in v1.0).
 > 
 > nbd_connect_command (h, NULL) in newer releases triggers SIGSEGV by
 > trying to dereference NULL (with LIBNBD_DEBUG=1, during
 > nbd_internal_printable_string_list in v1.4; otherwise, during
 > nbd_internal_set_argv in v1.6).  In older releases, it behaves the
 > same as (char **) { NULL } and triggers SIGABRT.
 > 
 > Both crashes are corner cases that have existed for a long time, and
 > unlikely to hit in real code.  Still, we shouldn't crash in a library.

 Is this an actual rule? 
More of a rule of thumb.  Letting a library kill an entire program
with SIGSEGV because of a bug in the outer program is not nice.  If we
document it well, we can say it's the program's own fault; but right
now, we don't document that a StringList parameter must be non-NULL,
either in our prose, or with __attribute__((nonnull)) in our public .h
files.

However, note that the use of the compiler attribute has its own
drawbacks - gcc treats it as license to optimize the implementation to
skip the NULL check, but does not always warn about callers that
violate the invariant and actually pass NULL in - if you use the
attribute, you actually lose out on the ability to write a library
that tries to document that passing NULL is dumb but which still
sanely handles NULL - because the sane handling is optimized away.
Libvirt tried doing this a while back, and ended up with their current
practice of enabling attribute((nonnull)) for Coverity scanning, but
disabling it for gcc compilation.

...

 I checked some libc functions and it seems like

   printf(NULL) => errno EINVAL
   puts(NULL) => segfault

 Both functions _do_ have nonnull attributes on the parameters, so GCC
 will warn (if it can statically analyze the situation). 
Yes, these are cases where the documentation is explicit that passing
in NULL leads to undefined behavior (part of the C standard).  Whether
gcc actually allows the glibc implementation to actually check for
NULL and turn it into errno EINVAL, or optimized it out and results in
a SIGSEGV, is then a question for how the library itself was compiled
while those .h attributes are present.

...

 > Forbid a NULL StringList in general (similar to how we already forbid
 > a NULL String); which also means a StringList parameter implies
 > may_set_error=true.  Refactor nbd_internal_set_argv() to split out the
 > vector population into a new helper function that can only fail with
 > ENOMEM (this function will also be useful in later patches that want
 > to copy a StringList, but can tolerate 0 elements), as well as to set
 > errors itself (all 2 callers updated) since detecting NULL for argv[0]
 > is unique to argv.
 > 
 > Tests are added in the next patch, to make it easier to review by
 > temporarily swapping patch order.
 > 
 > Changes from:
 > 
 > $ nbdsh -c 'h.connect_command([])'
 > nbdsh: generator/states-connect.c:247: enter_STATE_CONNECT_COMMAND_START: Assertion
`h->argv.ptr[0]' failed.
 > Aborted (core dumped)

 I do agree that this is wrong.  If I'm following the Python bindings
 correctly what is happening is that we're calling

   nbd_connect_command (h, { NULL });

 which is causing the first case above. 
Correct.  The python bindings always pass a non-NULL array; but when
the array is empty, there is no argv[0].  This fix is worth having no
matter what.

The other fix is whether we should special case passing in NULL
instead of an empty array (the python bindings can't trigger it).  We
already do this sort of NULL checking for String arguments (again,
where python bindings can't trigger that, either), so we do have
precedence; but we could also take alternatives of better documenting
that C code should not pass in NULL (it's no longer our fault if the
program didn't obey the documentation invariants and gets a crash
because they passed in NULL) and/or try to decorate our public .h with
attribute((nonnull)) to allow compilers that are capable of warning
users about bad inputs about their mistake (I'm not yet sure if gcc
has quite reached that point; it wasn't there a few years ago when
libvirt decided NOT to use attribute((nonnull)) in the public
headers).

...

 > to:
 > 
 > nbdsh: command line script failed: nbd_connect_command: missing command name in argv
list: Invalid argument

 This is definitely an improvement for Python code (which really should
 never crash).

 So I'm not sure about the total fix here.  Definitely we should be
 returning an error if a zero length list is passed to nbd_connect_*
 functions.

 But one thing I especially don't like about libvirt is that it turns
 various virFoo (NULL) calls into errors instead of segfaults, which in
 turn makes it much easier to ignore serious errors in the calling
 code.

 While I won't necessarily push too hard against this patch I don't
 feel it's the right direction unless someone can persuade me
 otherwise. 
I can split this into two parts; the obvious avoidance of the missing
argv[0] (we want that no matter what), and the less-obvious behavior
of what to do when NULL is passed to StringList (where we have several
options of what we want it to do, and where a segv + static compiler
flags that will warn users about suspect coding may indeed be nicer
than blindly failing with EFAULT, for forcing the user to not write
bad code - but I'm not yet sure if we have reliable compiler setups
that can be used to get that environment).

...
 > +++ b/generator/C.ml
 > @@ -570,7 +570,7 @@ let
 >           need_out_label := true
 >        | Flags (n, flags) ->
 >           print_flags_check n flags None
 > -      | String n ->
 > +      | String n | StringList n ->
 >           let value = match errcode with
 >             | Some value -> value
 >             | None -> assert false in 
This is the hunk that enforces StringList must not be NULL (the one
where attribute((nonnull)) may be worth exploring)...

...
 > +++ b/lib/utils.c 
...
 > 
 > +/* Store argv into h, or diagnose an error on failure. */
 > +int
 > +nbd_internal_set_argv (struct nbd_handle *h, char **argv)
 > +{
 > +  assert (argv);
 > +
 > +  if (argv[0] == NULL) {
 > +    set_error (EINVAL, "missing command name in argv list");
 > +    return -1;
 > +  }
 > +  if (nbd_internal_copy_string_list (&h->argv, argv) == -1) { 
While this hunk (and its helper function and updates to all callsites)
is where diagnosing empty arrays is the noncontroversial fix.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] [libnbd PATCH v3 01/18] api: Fix crashes on nbd_connect_command with bad argv