Re: [Libguestfs] [PATCH nbdkit v2 1/6] Add new public nbdkit_parse_probability function

Thursday, 18 May 2023

On Wed, May 17, 2023 at 04:05:23PM -0500, Eric Blake wrote:
...
 On Wed, May 17, 2023 at 11:06:54AM +0100, Richard W.M. Jones wrote:
 > In nbdkit-error-filter we need to parse parameters as probabilities.
 > This is useful enough to add to nbdkit, since we will use it in
 > another filter in future.
 > ---
 >  docs/nbdkit-plugin.pod                  | 19 +++++++
 >  plugins/python/nbdkit-python-plugin.pod |  6 ++
 >  include/nbdkit-common.h                 |  2 +
 >  server/nbdkit.syms                      |  1 +
 >  server/public.c                         | 37 +++++++++++++
 >  server/test-public.c                    | 73 +++++++++++++++++++++++++
 >  plugins/ocaml/NBDKit.mli                | 11 ++--
 >  plugins/ocaml/NBDKit.ml                 |  2 +
 >  plugins/ocaml/bindings.c                | 16 ++++++
 >  plugins/python/modfunctions.c           | 21 +++++++
 >  tests/test-python-plugin.py             | 12 ++++
 >  tests/test_ocaml_plugin.ml              |  1 +
 >  12 files changed, 196 insertions(+), 5 deletions(-)
 > 
 > diff --git a/docs/nbdkit-plugin.pod b/docs/nbdkit-plugin.pod
 > index 860c5cecb..e8d30a98e 100644
 > --- a/docs/nbdkit-plugin.pod
 > +++ b/docs/nbdkit-plugin.pod
 > @@ -1433,6 +1433,25 @@ C<str> can be a string containing a case-insensitive
form of various
 >  common toggle values.  The function returns 0 or 1 if the parse was
 >  successful.  If there was an error, it returns C<-1>.
 >  
 > +=head2 Parsing probabilities
 > +
 > +Use the C<nbdkit_parse_probability> utility function to parse
 > +probabilities.  Common formats understood include: C<"0.1">,
C<"10%">
 > +or C<"1:10">, which all mean a probability of 1 in 10.
 > +
 > + int nbdkit_parse_probability (const char *what, const char *str,
 > +                               double *ret);
 > +
 > +The C<what> parameter is printed in error messages to provide context.
 > +The C<str> parameter is the probability string.
 > +
 > +On success the function returns C<0> and sets C<*ret>.  B<Note>
that
 > +the probability returned may be outside the range S<[ 0.0..1.0 ]>, for
 > +example if C<str == "200%">.  If you want to clamp the result you
must
 > +check that yourself.

 Should we at least guarantee that the return value is non-negative
 (other than potentially -0.0 which compares equal to 0.0) and finite?
 I don't see how a negative or infinite probability will ever be
 useful (with apologies to Douglas Adams' infinite improbability drive). 
Looking at this in the cold light of day I think my confusion is
around probabilities versus percentages/fractions.

Actual probabilities can't be outside the range [0..1].  But it's
usual to have a percentage above 100%, eg. you could have a base rate
and then express 200% (2 times) this rate.

We can use this function to parse percentages as well as probabilties,
eg in an imaginary rate filter:

  bandwidth=10000 burst=200%

or:

  bandwidth=10000 burst=2:1

But I don't think negative percentages make sense.

So yes we can limit this to returning only numbers >= 0
(and also non NaN, not infinite).

...
 > +++ b/server/public.c
 > @@ -421,6 +421,43 @@ nbdkit_parse_size (const char *str)
 >    return size * scale;
 >  }
 >  
 > +NBDKIT_DLL_PUBLIC int
 > +nbdkit_parse_probability (const char *what, const char *str,
 > +                          double *retp)
 > +{
 > +  double d, d2;
 > +  char c;
 > +  int n;
 > +
 > +  if (sscanf (str, "%lg%[:/]%lg%n", &d, &c, &d2, &n) == 3
&&
 > +      strcmp (&str[n], "") == 0) { /* N:M or N/M */
 > +    if (d == 0 && d2 == 0)         /* 0/0 is OK */

 I'd write this 'if (d == 0.0 && d2 == 0.0)' to make it obvious that
we
 know we are doing floating point comparisons here (semantics are the
 same either way, though, because of how integer 0 promotes under
 standard arithmetic promotion). 
OK.

...
 > +      ;
 > +    else if (d2 == 0)              /* N/0 is bad */
 > +      goto bad_parse;
 > +    else
 > +      d /= d2;
 > +  }
 > +  else if (sscanf (str, "%lg%n", &d, &n) == 1) {
 > +    if (strcmp (&str[n], "%") == 0) /* percentage */
 > +      d /= 100.0;
 > +    else if (strcmp (&str[n], "") == 0) /* probability */
 > +      ;
 > +    else
 > +      goto bad_parse;
 > +  }
 > +  else
 > +    goto bad_parse;

 Here's where it might be worth adding:

 if (d < 0.0 || !isfinite (d))
   goto bad_parse;

 to filter out the cases where the user passed in "inf", "nan", or
even
 some form of large/small that results in overflow.

 You caould also use 'signbit (d)' instead of 'd < 0.0' if you want to
 prevent -0.0 from causing surprises (while IEEE 754 and therefore the
 compiler treats '0.0 == -0.0' as true despite being different bit
 patterns, x/0.0 and x/-0.0 for finite x differ in behavior). 
OK, but added to nbdkit_parse_probability itself.

...
 > +
 > +  if (retp)
 > +    *retp = d;
 > +  return 0;
 > +
 > + bad_parse:
 > +  nbdkit_error ("%s: could not parse '%s'", what, str);

 Should this say "could not parse '%s' as a probability" to help the
 user search the documntation for what forms can be parsed? 
Yes, it would also be good to mention the function name.

...
 > +  return -1;

 If you get here, *retp was unchanged. [1]

 > +}
 > +
 >  /* Parse a string as a boolean, or return -1 after reporting the error.
 >   */
 >  NBDKIT_DLL_PUBLIC int
 > diff --git a/server/test-public.c b/server/test-public.c
 > index 676411290..0d84abdd2 100644
 > --- a/server/test-public.c
 > +++ b/server/test-public.c
 > @@ -200,6 +200,78 @@ test_nbdkit_parse_size (void)
 >    return pass;
 >  }
 >  
 > +static bool
 > +test_nbdkit_parse_probability (void)
 > +{
 > +  size_t i;
 > +  bool pass = true;
 > +  struct pair {
 > +    const char *str;
 > +    int result;
 > +    double expected;
 > +  } tests[] = {
 > +    /* Bogus strings */
 > +    { "", -1 },
 > +    { "garbage", -1 },
 > +    { "0garbage", -1 },
 > +    { "1X", -1 },
 > +    { "1%%", -1 },
 > +    { "1:", -1 },
 > +    { "1:1:1", -1 },
 > +    { "1:0", -1 }, /* format is valid but divide by zero is not allowed
*/
 > +    { "1/", -1 },
 > +    { "1/2/3", -1 },

 If we add my proposed check above for filtering out negatives and
 non-finite values, we could also test that things like "-1", "inf",
 "nan", "1e200/1e-200" are rejected.  Likewise worth adding a test
for
 "-0" (whether you decide to permit or reject it). 
OK.

...
 > +
 > +    /* Numbers. */
 > +    { "0", 0, 0 },
 > +    { "1", 0, 1 },
 > +    { "2", 0, 2 }, /* values outside [0..1] range are allowed */
 > +    { "0.1", 0, 0.1 },
 > +    { "0.5", 0, 0.5 },
 > +    { "0.9", 0, 0.9 },
 > +    { "1.0000", 0, 1 },
 > +
 > +    /* Percentages. */
 > +    { "0%", 0, 0 },
 > +    { "50%", 0, 0.5 },
 > +    { "100%", 0, 1 },
 > +    { "90.25%", 0, 0.9025 },
 > +
 > +    /* N in M */
 > +    { "1:1000", 0, 0.001 },
 > +    { "1/1000", 0, 0.001 },
 > +    { "2:99", 0, 2.0/99 },
 > +    { "2/99", 0, 2.0/99 },
 > +    { "0:1000000", 0, 0 },

 Since you document in patch 5 that we have a default percentage of
 "1e-8", it is worth testing that as an explicitly supported string. 
OK.

...
 > +  };
 > +
 > +  for (i = 0; i < ARRAY_SIZE (tests); i++) {
 > +    int r;
 > +    double d;

 Uninitialized...

 > +
 > +    error_flagged = false;
 > +    r = nbdkit_parse_probability ("test", tests[i].str, &d);

 ...and per [1] above, there are code paths where d is not assigned...

 > +    if (r != tests[i].result) {
 > +      fprintf (stderr,
 > +               "Wrong return value for %s, got %d, expected %d\n",
 > +               tests[i].str, r, tests[i].result);
 > +      pass = false;
 > +    }
 > +    if (r == 0 && d != tests[i].expected) {
 > +      fprintf (stderr,
 > +               "Wrong result for %s, got %g, expected %g\n",
 > +               tests[i].str, d, tests[i].expected);

 ...so this can end up printing garbage.  You may want to consider
 having nbdkit_parse_probability assign into d on all code paths, not
 just success. 
I don't think I understand this.  Surely if r == 0 then d has been
assigned, and if r == -1 we don't print d?

...
 > +      pass = false;
 > +    }
 > +    if ((r == -1) != error_flagged) {
 > +      fprintf (stderr, "Wrong error message handling for %s\n",
tests[i].str);
 > +      pass = false;
 > +    }
 > +  }
 > +
 > +  return pass;
 > +}
 > +
 >  static bool
 >  test_nbdkit_parse_ints (void)
 >  {
 > @@ -503,6 +575,7 @@ main (int argc, char *argv[])
 >  {
 >    bool pass = true;
 >    pass &= test_nbdkit_parse_size ();
 > +  pass &= test_nbdkit_parse_probability ();
 >    pass &= test_nbdkit_parse_ints ();
 >    pass &= test_nbdkit_read_password ();
 >    /* nbdkit_absolute_path and nbdkit_nanosleep not unit-tested here, but

 > +++ b/plugins/ocaml/NBDKit.ml
 > @@ -160,6 +160,8 @@ let register_plugin ~name
 >  (* Bindings to nbdkit server functions. *)
 >  external set_error : Unix.error -> unit = "ocaml_nbdkit_set_error"
[@@noalloc]
 >  external parse_size : string -> int64 = "ocaml_nbdkit_parse_size"
 > +external parse_probability : string -> string -> float =
 > +  "ocaml_nbdkit_parse_probability"

 Is OCaml 'float' required to be any specific floating point (such as
 IEEE 754 binary64), or is it some nebulous hardware-specific floating
 point (possibly 32-bit instead of 64-bit, or even permitting VAX
 instead of IEEE)?  But how OCaml maps floating point is not a
 show-stopper to this patch, since we already state OCaml bindings
 don't have ABI stability like C bindings.  (I understand why modern
 languages have picked 'float' to mean floating-point while still
 favoring the IEEE 754 binary64 representations, where C is the odd one
 out that picked 'float' as binary32 and is stuck with the type name
 'double' which has no resemblance to floating-point but rather to its
 size difference.) 
I believe it's the same as a C double.  There's no standard for OCaml
so it's just whatever the current implementation does.

...
 [Side note: if you really want a trip, read the 2023 SIGBOVIK
article
 on "GradIEEEnt half decent" about 16-bit floating point values being
 exploited for their non-linear rounding properties as a way to create
 non-monotonic functions that can in turn form the basis of a Turing
 complete system capable of running a 36-second solution of Mario level
 1-1 in 19k minutes of wall time using only half-precision
 floating-point operations... https://sigbovik.org/2023/,
 http://tom7.org/grad/murphy2023grad.pdf] 
Will do :-/

...
 > +++ b/plugins/python/modfunctions.c
 > @@ -122,11 +122,32 @@ parse_size (PyObject *self, PyObject *args)
 >    return PyLong_FromSize_t ((size_t)size);
 >  }
 >  
 > +/* nbdkit.parse_probability */
 > +static PyObject *
 > +parse_probability (PyObject *self, PyObject *args)
 > +{
 > +  const char *what, *str;
 > +  double d;
 > +
 > +  if (!PyArg_ParseTuple (args, "ss:parse_probability", &what,
&str))
 > +    return NULL;
 > +
 > +  if (nbdkit_parse_probability (what, str, &d) == -1) {
 > +    PyErr_SetString (PyExc_ValueError,
 > +                     "Unable to parse string as probability");
 > +    return NULL;
 > +  }
 > +
 > +  return PyFloat_FromDouble (d);

 Another language binding that does not directly guarantee that 'Float'
 is an IEEE 64-bit value, but where we are probably safe:

https://stackoverflow.com/questions/70184494/on-what-systems-does-python-...

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] [PATCH nbdkit v2 1/6] Add new public nbdkit_parse_probability function