Re: [Libguestfs] [PATCH nbdkit] server: public: Add nbdkit_parse_* functions for safely parsing integers.

Monday, 23 September 2019

On Mon, Sep 23, 2019 at 12:05:11PM -0500, Eric Blake wrote:
...
 > + int nbdkit_parse_long (const char *what, const char *str, long
*r);
 > + int nbdkit_parse_unsigned_long (const char *what,
 > +                                 const char *str, unsigned long *r);

 Do we really want to encourage the use of parse_long and
 parse_unsigned_long?  Those differ between 32- and 64-bit platforms, at
 which point, it's often better to explicitly call out the intended size. 
So I'll say why I added these.  It may not be a good reason, but here
goes :-)

There are two external APIs we use which need long types, libcurl
(curl_easy_setopt) and libssh (ssh_options_set).  The existing
parameters had type long (I don't think unsigned long is actually
used) which we passed directly to one of these APIs.  To use another
type would involve a slightly awkward check + cast, with the danger
that the code would break on the rarely tested 32 bit platform.

...
...
 > + int nbdkit_parse_int16_t (const char *what,
 > +                           const char *str, int16_t *r);
 > + int nbdkit_parse_uint16_t (const char *what,
 > +                            const char *str, uint16_t *r);

 I guess we can add [u]int8_t variants later if a need arises?  Or should
 we do it now? 
We certainly could.  I couldn't see an immediate reason, except maybe
for the ’mbr-id’ parameter.

...
 > + int nbdkit_parse_int32_t (const char *what,
 > +                           const char *str, int32_t *r);
 > + int nbdkit_parse_uint32_t (const char *what,
 > +                            const char *str, uint32_t *r);
 > + int nbdkit_parse_int64_t (const char *what,
 > +                           const char *str, int64_t *r);
 > + int nbdkit_parse_uint64_t (const char *what,
 > +                            const char *str, uint64_t *r);
 > + int nbdkit_parse_ssize_t (const char *what,
 > +                           const char *str, ssize_t *r);
 > + int nbdkit_parse_size_t (const char *what,
 > +                          const char *str, size_t *r);

 [s]size_t is another one that can differ between 32- and 64-bit
 platforms; but I'm fine with having these function, especially since
 there are still efforts being considered about an alternative kernel ABI
 with 64-bit integers but 32-bit size_t. 
Yes I think if we had to drop long or size_t, I'd rather keep the
size_t ones.  Those are actually meaningful -- eg. if used for sizing
an internal array.

...
 > +Parse string C<str> into an integer of various types. 
These functions
 > +parse a decimal, hexadecimal (C<"0x...">) or octal
(C<"0...">) number.
 > +
 > +On success the functions return C<0> and set C<*r> to the parsed value
 > +(unless C<*r == NULL> in which case the result is discarded).  On
 > +error, C<nbdkit_error> is called and the functions return C<-1>.
 > +
 > +The C<what> parameter is printed in error messages to provide context.
 > +It should usually be a short descriptive string of what you are trying
 > +to parse, eg:
 > +
 > + if (nbdkit_parse_int ("random seed", argv[1], &seed) == -1)
 > +   return -1;
 > +
 > +might print an error:
 > +
 > + random seed: could not parse number: "lalala"
 > +

 Do we want to make it part of our documented contract that *r is
 unchanged if we return -1? 
Yes I think we can document this.  It's useful to have that behaviour
defined because it makes error recovery simpler.  I will add it to my
copy.

...
 > @@ -129,22 +129,20 @@ cache_config (nbdkit_next_config *next,
void *nxdata,
 >      return 0;
 >    }
 >    else if (strcmp (key, "cache-high-threshold") == 0) {
 > -    if (sscanf (value, "%d", &hi_thresh) != 1) {
 > -      nbdkit_error ("invalid cache-high-threshold parameter: %s",
value);
 > +    if (nbdkit_parse_unsigned ("cache-high-threshold",
 > +                               value, &hi_thresh) == -1)

 Semantic change; previously "010" was '10' and "08" was
'8', now "010"
 is '8' and "08" is a failure (because we used "%d" instead of
"%i").
 (Multiple spots in this patch).  Not a big deal to me, but we may want
 to call it out in the commit message as intentional. 
I guess %i (not %d) is the one which parses octal and hex.  I think
this change is for the better however since it improves consistency,
so I'll mention it in the commit message.

...
 > +++ b/include/nbdkit-common.h
 > @@ -84,6 +84,28 @@ extern void nbdkit_vdebug (const char *msg, va_list args);
 >  extern char *nbdkit_absolute_path (const char *path);
 >  extern int64_t nbdkit_parse_size (const char *str);
 >  extern int nbdkit_parse_bool (const char *str);
 > +extern int nbdkit_parse_int (const char *what, const char *str,
 > +                             int *r);

 Should we mark 'what' and 'str' as being non-null parameters?  But you
 definitely document 'r' as valid when NULL. 
I think we decided to avoid using __attribute__((nonnull)) on the
external API, because it was ... unsafe?  I don't recall now if we
avoided it deliberately or not, but we definitely don't use it except
on internal functions!

Yes it would make sense as long as it's not unsafe.

...
 > +++ b/plugins/curl/curl.c
 > @@ -204,7 +204,9 @@ curl_config (const char *key, const char *value)
 >    }
 >  
 >    else if (strcmp (key, "timeout") == 0) {
 > -    if (sscanf (value, "%ld", &timeout) != 1 || timeout < 0) {
 > +    if (nbdkit_parse_long ("timeout", value, &timeout) == -1)
 > +      return -1;
 > +    if (timeout < 0) {
 >        nbdkit_error ("'timeout' must be 0 or a positive timeout in
seconds");
 >        return -1;

 Why not change to 'unsigned timeout'?  'long' is rather oversized on
 64-bit platforms, and keeping things signed doesn't buy much compared to
 just going unsigned.  (Yep, that's me trying to get rid of a need for
 parse_long). 
See above about curl_easy_setopt.

...
 > +++ b/plugins/partitioning/partitioning.c
 > @@ -211,10 +211,8 @@ partitioning_config (const char *key, const char *value)
 >    else if (strcmp (key, "mbr-id") == 0) {
 >      if (strcasecmp (value, "default") == 0)
 >        mbr_id = DEFAULT_MBR_ID;
 > -    else if (sscanf (value, "%i", &mbr_id) != 1) {
 > -      nbdkit_error ("could not parse mbr-id: %s", value);
 > +    else if (nbdkit_parse_int ("mbr-id", value, &mbr_id) == -1)
 >        return -1;

 Is 'int' the best type for this? 
Probably not, see above.

...
 > +++ b/plugins/ssh/ssh.c
 > @@ -59,7 +59,7 @@ static bool verify_remote_host = true;
 >  static const char *known_hosts = NULL;
 >  static const char **identity = NULL;
 >  static size_t nr_identities = 0;
 > -static long timeout = 0;
 > +static unsigned long timeout = 0;

 Again, a 32-bit timeout regardless of 32- or 64-bit platform is probably
 saner. 
See above about ssh_options_set.

...
 > +++ b/plugins/vddk/vddk.c
 > @@ -271,10 +271,8 @@ vddk_config (const char *key, const char *value)
 >        return -1;
 >    }
 >    else if (strcmp (key, "nfchostport") == 0) {
 > -    if (sscanf (value, "%d", &nfc_host_port) != 1) {
 > -      nbdkit_error ("cannot parse nfchostport: %s", value);
 > +    if (nbdkit_parse_int ("nfchostport", value, &nfc_host_port) ==
-1)
 >        return -1;
 > -    }

 Is 'int' the right type here? 
Actually this is a bug twice over.  The VDDK struct wants uint32_t for
both of the port fields.  Obviously since they are port numbers
something like uint16_t would be more suitable.  (0 has the special
meaning of "default port").

I think I'll change this to uint16_t because the implicit cast up to
uint32_t when we assign it to the VDDK struct shouldn't be a problem,
and it's useful to have bounds checking.

...
 > +/* Functions for parsing signed integers. */
 > +int
 > +nbdkit_parse_int (const char *what, const char *str, int *rp)
 > +{
 > +  long r;
 > +  char *end;
 > +
 > +  errno = 0;
 > +  r = strtol (str, &end, 0);
 > +#if INT_MAX != LONG_MAX
 > +  if (r < INT_MIN || r > INT_MAX)
 > +    errno = ERANGE;
 > +#endif
 > +  PARSE_COMMON_TAIL;
 > +}

 Looks correct. 
I actually compiled this and ran the tests with -m32 and -m64.  It
took several rounds to get it right.

...
 > +#endif
 > +  if (r < SSIZE_MIN || r > SSIZE_MAX)
 > +    errno = ERANGE;
 > +#endif
 > +  PARSE_COMMON_TAIL;
 > +}
 > +
 > +/* Functions for parsing unsigned integers. */
 > +
 > +/* strtou* functions have surprising behaviour if the first character
 > + * (after whitespace) is '-', so deny this.
 > + */
 > +#define PARSE_ERROR_IF_NEGATIVE                                         \
 > +  do {                                                                  \
 > +    const char *p = str;                                                \
 > +    while (isspace (*p))                                                \
 > +      p++;                                                              \
 > +    if (*p == '-') {                                                    \
 > +      nbdkit_error ("%s: negative numbers are not allowed", what);     
\
 > +      return -1;                                                        \
 > +    }                                                                   \
 > +  } while (0)
 > +

 Agree.  However, why not modify str in-place, so that the subsequent
 strtoul() does not have to repeat our prefix scan? 
Yes, that's a good idea, I didn't think it through.

...
 > +  PARSE_COMMON_TAIL;
 > +}
 > +
 > +int
 > +nbdkit_parse_size_t (const char *what, const char *str, size_t *rp)
 > +{
 > +  unsigned long long r;
 > +  char *end;
 > +
 > +  PARSE_ERROR_IF_NEGATIVE;
 > +  errno = 0;
 > +  r = strtoull (str, &end, 0);
 > +#if SIZE_MAX != ULONGLONG_MAX
 > +  if (r > SIZE_MAX)
 > +    errno = ERANGE;
 > +#endif
 > +  PARSE_COMMON_TAIL;
 > +}
 > +
 >  /* Parse a string as a size with possible scaling suffix, or return -1
 >   * after reporting the error.
 >   */

 Hmm, seeing this comment, do we really need raw [s]size_t parsing, or
 can we get away with scaled parsing anywhere that sizes are intended?
 But the code looks correct if you want to keep it. 
nbdkit_parse_ssize_t is not used.  nbdkit_parse_size_t is used for a
single case, ‘xz-max-depth’ which is genuinely an array length, so
size_t is (albeit marginally) useful here.

It's not worth dying on this particular hill, but I think there is
some use for all of these functions.

...
 > diff --git a/server/socket-activation.c
b/server/socket-activation.c
 > index b9db67c..50df4ca 100644
 > --- a/server/socket-activation.c
 > +++ b/server/socket-activation.c
 > @@ -59,11 +59,8 @@ get_socket_activation (void)
 >    s = getenv ("LISTEN_PID");
 >    if (s == NULL)
 >      return 0;
 > -  if (sscanf (s, "%u", &pid) != 1) {
 > -    fprintf (stderr, "%s: malformed %s environment variable
(ignored)\n",
 > -             program_name, "LISTEN_PID");
 > +  if (nbdkit_parse_unsigned ("LISTEN_PID", s, &pid) == -1)
 >      return 0;
 > -  }

 Possible semantic change: on parse failure, are we still printing
 something to stderr, or is nbdkit_error() not effective this early? 
Yes nbdkit_error works right from the start.  We have already used it
for parsing server argv[].

...
 > +++ b/server/test-public.c

 > +static bool
 > +test_nbdkit_parse_ints (void)
 > +{
 > +  bool pass = true;
 > +  char s[64];
 > +
 > +#define PARSE(TYPE, FORMAT, TEST, RET, EXPECTED)                        \
 > +  do {                                                                  \
 > +    error_flagged = false;                                              \
 > +    TYPE i = 0;                                                         \
 > +    int r = nbdkit_parse_##TYPE ("test", TEST, &i);                  
  \
 > +    if (r != RET || i != EXPECTED) {                                    \
 > +      fprintf (stderr,                                                  \
 > +               "%s: %d: wrong parse for %s: r=%d i=" FORMAT
"\n",       \
 > +               __FILE__, __LINE__, TEST, r, i);                         \
 > +      pass = false;                                                     \
 > +    }                                                                   \
 > +    if ((r == -1) != error_flagged) {                                   \
 > +      fprintf (stderr,                                                  \
 > +               "%s: %d: wrong error message handling for %s\n",        
\
 > +               __FILE__, __LINE__, TEST);                               \
 > +      pass = false;                                                     \
 > +    }                                                                   \
 > +  } while (0)
 > +#define OK 0
 > +#define BAD -1

 Nice. 
But I discovered you cannot do:

  #define BAD -1, 0

which is a shame :-(

...
 > +
 > +  /* Test the basic parsing of decimals, hexadecimal, octal and
 > +   * negative numbers.
 > +   */
 > +  PARSE (int, "%d", "0",    OK, 0);
 > +  PARSE (int, "%d", "1",    OK, 1);
 > +  PARSE (int, "%d", "99",   OK, 99);
 > +  PARSE (int, "%d", "0x1",  OK, 1);
 > +  PARSE (int, "%d", "0xf",  OK, 15);
 > +  PARSE (int, "%d", "0x10", OK, 16);
 > +  PARSE (int, "%d", "0xff", OK, 255);
 > +  PARSE (int, "%d", "01",   OK, 1);
 > +  PARSE (int, "%d", "07",   OK, 7);
 > +  PARSE (int, "%d", "010",  OK, 8);
 > +  PARSE (int, "%d", "+0",   OK, 0);
 > +  PARSE (int, "%d", "+99",  OK, 99);
 > +  PARSE (int, "%d", "+0xf", OK, 15);
 > +  PARSE (int, "%d", "+010", OK, 8);
 > +  PARSE (int, "%d", "-0",   OK, 0);
 > +  PARSE (int, "%d", "-99",  OK, -99);
 > +  PARSE (int, "%d", "-0xf", OK, -15);

 I'd test upper case at least once, such as -0XF here.  Testing leading
 whitespace would also be nice. 
Ah yes, good idea.

...
 > +  PARSE (int, "%d", "-010", OK, -8);
 > +  PARSE (int, "%d", "2147483647", OK, 2147483647); /* INT_MAX
*/
 > +  PARSE (int, "%d", "-2147483648", OK, -2147483648); /* INT_MIN
*/
 > +  PARSE (int, "%d", "0x7fffffff", OK, 0x7fffffff);
 > +  PARSE (int, "%d", "-0x80000000", OK, -0x80000000);

 Matches our earlier assumptions of twos-complement targets.

 > +
 > +  /* Test basic error handling. */
 > +  PARSE (int, "%d", "",        BAD, 0);
 > +  PARSE (int, "%d", "-",       BAD, 0);

 I might also test "- 0" as BAD. 
OK.

...
 > +  PARSE (int, "%d", "+",       BAD, 0);
 > +  PARSE (int, "%d", "++",      BAD, 0);
 > +  PARSE (int, "%d", "++0",     BAD, 0);
 > +  PARSE (int, "%d", "--0",     BAD, 0);
 > +  PARSE (int, "%d", "0x",      BAD, 0);

 I'd also test "08" as BAD. 
OK.

...
 > +  PARSE (int, "%d", "42x",     BAD, 0);
 > +  PARSE (int, "%d", "42-",     BAD, 0);
 > +  PARSE (int, "%d", "garbage", BAD, 0);
 > +  PARSE (int, "%d", "2147483648", BAD, 0); /* INT_MAX + 1 */
 > +  PARSE (int, "%d", "-2147483649", BAD, 0); /* INT_MIN - 1 */
 > +  PARSE (int, "%d", "999999999999999999999999", BAD, 0);
 > +  PARSE (int, "%d", "-999999999999999999999999", BAD, 0);

 It might also be worth testing "0x1p1" as invalid (and not an accidental
 hex-float).  Other possible non-integer strings to be sure we don't
 accidentally parse include "inf", "nan", "0.0". 
OK.

...
 > +
 > +  /* Test nbdkit_parse_unsigned. */
 > +  PARSE (unsigned, "%u", "0",    OK, 0);
 > +  PARSE (unsigned, "%u", "1",    OK, 1);
 > +  PARSE (unsigned, "%u", "99",   OK, 99);
 > +  PARSE (unsigned, "%u", "0x1",  OK, 1);
 > +  PARSE (unsigned, "%u", "0xf",  OK, 15);
 > +  PARSE (unsigned, "%u", "0x10", OK, 16);
 > +  PARSE (unsigned, "%u", "0xff", OK, 255);
 > +  PARSE (unsigned, "%u", "01",   OK, 1);
 > +  PARSE (unsigned, "%u", "07",   OK, 7);
 > +  PARSE (unsigned, "%u", "010",  OK, 8);
 > +  PARSE (unsigned, "%u", "+0",   OK, 0);
 > +  PARSE (unsigned, "%u", "+99",  OK, 99);
 > +  PARSE (unsigned, "%u", "+0xf", OK, 15);
 > +  PARSE (unsigned, "%u", "+010", OK, 8);
 > +  PARSE (unsigned, "%u", "-0",   BAD, 0);
 > +  PARSE (unsigned, "%u", "-99",  BAD, 0);
 > +  PARSE (unsigned, "%u", "-0xf", BAD, 0);
 > +  PARSE (unsigned, "%u", "-010", BAD, 0);

 I'd also test at least " 0" and " -0" (or similar), to prove we
handled
 leading spaces correctly. 
OK.

...
 > +  PARSE (unsigned, "%u", "2147483647", OK,
2147483647); /* INT_MAX */
 > +  PARSE (unsigned, "%u", "-2147483648", BAD, 0); /* INT_MIN */
 > +  PARSE (unsigned, "%u", "0x7fffffff", OK, 0x7fffffff);
 > +  PARSE (unsigned, "%u", "-0x80000000", BAD, 0);
 > +
 > +  /* Test nbdkit_parse_long. */
 > +  PARSE (long, "%ld", "0", OK, 0);
 > +  assert (snprintf (s, sizeof s, "%" PRIi64, (int64_t) LONG_MAX) != -1);
 > +  PARSE (long, "%ld", s, OK, LONG_MAX);
 > +  assert (snprintf (s, sizeof s, "%" PRIi64, (int64_t) LONG_MIN) != -1);
 > +  PARSE (long, "%ld", s, OK, LONG_MIN);
 > +  PARSE (long, "%ld", "999999999999999999999999", BAD, 0);
 > +  PARSE (long, "%ld", "-999999999999999999999999", BAD, 0);

 snprintf() side effects inside assert(). Tsk tsk - the test will now
 fail under -DNDEBUG.  And that's part of why I don't think we want to
 expose parse_long. 
We should probably loudly break or skip the tests if they
are compiled with -DNDEBUG.  We're using assert for testing
in a couple of places.

...
 > +++ b/server/usergroup.c
 > @@ -97,7 +97,7 @@ parseuser (const char *id)
 >  
 >      saved_errno = errno;
 >  
 > -    if (sscanf (id, "%d", &val) == 1)
 > +    if (nbdkit_parse_int ("parseuser", id, &val) == 0)

 Is 'int' still the right type for this? 
Hopefully it will warn us if uid_t stops being int.  (Also
we're assuming pid_t == int).

Thanks,

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] [PATCH nbdkit] server: public: Add nbdkit_parse_* functions for safely parsing integers.