On 5/16/23 22:22, Richard W.M. Jones wrote:
On Tue, May 16, 2023 at 02:35:38PM -0500, Eric Blake wrote:
> Our pre-existing error filter is already doing percentage
calculations
> via floating point, so on the grounds that this patch is consolidating
> common code patterns to avoid reimplementation, we're good. But as you
> say, representing things as a ratio between two integers may be more
> maintainable than floating point where accuracy matters, or this may
> be a case where we are explicit that use of the floating point
> fractions as a percentage is intentionally liable to rounding issues.
> Still, avoiding undefined behavior when the division can overflow
> (whether that be the overflow to floating point infinity or even the
> overflow beyond UINT64_MAX), does make it worth questioning if
> floating point is our best representation, or at a minimum if we want
> to be stricter in what we parse before passing a substring on to
> strtod() so that we have more control over the values (part of my 19
> patches to qemu was explicitly truncating any string at 'e' or 'E'
so
> that strtod() could not do an exponent parse that would inadverently
> hit overflow to INFINITY).
I'm not really sure we reached a conclusion so far, but I did want to
say that I changed the evil filter impl so that it now treats any
0 <= P < 1e-12 as effectively 0. (P == evil-probability; P < 0 and P > 1
are still rejected at config time as before).
With P == 1e-12:
nbdkit: debug: evil: mode: stuck-bits, P: 1e-12
nbdkit: debug: evil: block_size: 17592186044416 (2**44)
nbdkit: debug: evil: expected bits per block: 140.737
(The large block size isn't an issue here as nothing is allocated.
However it would be a problem if the block_size calculation overflows,
and I believe that limiting P to larger values avoids this.)
Right, this is about the direction I could agree with -- and again I
want to emphasize that you don't *need* my agreement to proceed with the
floating point approach!
I think enforcing a minimum for P could be useful too, for wherever we
divide by P.
Also I'm not sure that handling probabilities as a numerator and
denominator actually helps here? Both still need to be stored as
floats, so it just pushes the same problem to the plugin.
No, the point with N/D is that they're both positive integers (well, N
may be zero as well), and that they *together* give you a single
rational probability, namely N/D. They'd be stored as separate unsigned
integer fields everywhere, and the division would only be performed (in
unsigned integer) wherever directly needed, such as in (a * N) / D.
I don't know if this is compatible with existent code. If we already
have code that parses floats (and Eric's comments imply we do), then
that code couldn't be rebased on this N/D representation, compatibly.
But, I'd consider introducing a new (in a way, "divergent") N/D
representation a fair price, for not adding more floating point!
Refactoring (extracting common code) is only useful if we can fix the
extracted code later on, and that's a tough challenge with floating point.
Laszlo