Re: [Libguestfs] [PATCH nbdkit 2/2] common/include/checked-overflow.h: Provide fallback

Thursday, 18 November 2021

(... long email)

On 11/17/21 19:07, Richard W.M. Jones wrote:
...
 On Mon, Nov 15, 2021 at 02:11:01PM +0100, Laszlo Ersek wrote:
> (2) Should nbdkit continue building on RHEL7? In the OCaml upgrade
> thread
> <https://listman.redhat.com/archives/libguestfs/2021-November/msg00095.htm...
> I thought we practically abandoned RHEL7 for the v2v projects.

 If I sound schizophrenic on RHEL 7 that'll be because I'm in two minds
 still about supporting it.

 In this particular case I was trying to test the OCaml changes on the
 oldest version of OCaml I have easy access to and found that there
 were various problems compiling on RHEL 7.  I fixed the other
 problems, but this fix was a bit more controversial.

> (3) Red Hat Developer Toolset 10 (and 11 Beta) are available for
> RHEL7:
>
> https://developers.redhat.com/products/developertoolset/overview
>
>
https://access.redhat.com/documentation/en-us/red_hat_developer_toolset/1...
>
>
https://access.redhat.com/documentation/en-us/red_hat_developer_toolset/1...
>
> providing gcc 10 and 11 respectively.
>
> We could say that building nbdkit on RHEL7 requires devtoolset.

 This doesn't sound very convenient, but there's a stronger reason to
 keep stuff working with GCC 3/4 and that's because OpenBSD sticks with
 GCC 4.2 as it was the last GPLv2+ version.  I think the reason OpenBSD
 does this is wrong, but that's their choice.  (Of course they have
 Clang).

> (1) I think that APIs that lie are wrong. I could contribute standard
> C implementations if we really think nbdkit should continue building
> on RHEL7. The difference with the built-ins should only be in
> performance, not in functionality.

 I agree.  If you want to have a go at standard C versions that would
 be great.  Even unsigned addition looks very hard.

 I found this lengthy paper on the subject:
 https://www.cs.utah.edu/~regehr/papers/overflow12.pdf 
I've not checked the paper, but I can say two things at once:

- "~regehr" means it's going to be great :)

- signed integers are a huge complication, while at the same time, their
  use (or usefulness) is extremely limited for dealing with memory
  allocations, sizes, indexing, file offsets, and so on.

I wouldn't try to do this in a type-transparent manner. I'd first revert
commit b31859402d14 ("common/include/checked-overflow.h: Simplify",
2021-11-11), and then add separate functions for uint64_t and size_t.

If we don't want to be 100% standard C, just make it work for gcc-4.2,
then a type-generic(-ish) solution could exist using "typeof" and
(perhaps) statement expressions. (Both are GCC extensions that exist in
4.2:
- <https://gcc.gnu.org/onlinedocs/gcc-4.2.4/gcc/Statement-Exprs.html>
- <https://gcc.gnu.org/onlinedocs/gcc-4.2.4/gcc/Typeof.html>.)

To go into some details:

(1) The idea is that we want the following C expression to evaluate to
true, without OP overflowing first:

  a OP b <= max

where "a", "b", "max" are all of the same unsigned integer
type, and OP
is either "+" or "*".

(2) In case OP is "+":

  a + b <= max

we can always subtract "b" from both sides:

  a <= max - b

This condition is checkable in C as-is, because "max - b" cannot
underflow. If the condition evaluates to true, then "a + b" won't
overflow.

(3) In case OP is "*":

  a * b <= max

we need to see if "b" is zero, at first. If "b" is zero, the
multiplication is safe. If "b" is nonzero, then we can divide both sides
by it:

  a <= max / b

This condition is checkable in C as-is, because "max / b" cannot wrap,
or involve a division by zero.

However, unlike in the addition/subtraction case (2), the *equivalence*
of this "rearrangement as a division" with the original

  a * b <= max

is not immediately clear, given that the "/" operator in C (between
integer operands) is not an exact division, but division with a
remainder. As the standard says, "the result ... is the algebraic
quotient with any fractional part discarded".

The usual way to prove "equivalence" between two logical formulae is to
prove "implication" in both directions. That is, assuming "b" is
nonzero, let's prove both of the following implications between
C-language expressions:

  (a <= max / b)  IMPLIES  (a * b <= max)   [1]

  (a * b <= max)  IMPLIES  (a <= max / b)   [2]

(5) At first, let's express what the

  max / b == q                              [3]

expression in C means, expressed *mathematically*. That way, we can
investigate [1] and [2] in pure math.

The C expression [3] means, mathematically speaking:

  max = q * b + r                           [4]

where "q" is a nonnegative integer, and "r" is a nonnegative integer
strictly smaller than "b". And, this decomposition is unique, given any
pair of (max, b). (And we posited earlier that "b" is nonzero, of
course.)

(6) Now, going back to [1] and [2]:

- wherever we have a (max / b) C-language expression, substitute "q"
(from [3];

- wherever we have a standalone "max" expression, substitute "q * b +
r"
(from [4]).

We get the following *mathematical* (not C!) expressions:

  (a <= q)              IMPLIES  (a * b <= q * b + r)   [5]

  (a * b <= q * b + r)  IMPLIES  (a <= q)               [6]

(7) Implication [5] is trivially true, as we multiply both sides of the
premise inequality (a <= q) with the positive integer "b" at first:

  a * b <= q * b

and then add the nonnegative integer "r" to the right hand side only:

   a * b <= q * b + r

(8) Implication [6] is slightly more tricky. First, let's divide both
sides of the premise inequality

  (a * b <= q * b + r)

with the positive integer "b":

  a * b    q * b + r
  ----- <= ---------
    b          b

That simplifies to:

  a <= q + (r/b)                                        [7]

Here's where we consider the restriction on "r" from [4]: "r" is
a
nonnegative integer *strictly smaller* than "b". That gives us, for the
(r/b) addend in separation:

  0 <= (r / b) < 1

Which means that we can tack another (strict) inequality to the right
side of [7]:

  a <= q + (r/b) < q + 1                                [8]
                 ^^^^^^^

(Because, adding (r/b) to "q" produces a smaller number than adding 1 to
"q" does.)

Considering the two far sides of [8] together, we get

  a < q + 1

Then, as both "a" and "q" are integers, this is equivalent to

  a <= q

which is what we wanted to reach in [6].

(9) And so the C-language functions are just:

bool
add_size_t_overflow (size_t a, size_t b, size_t *result)
{
  if (a <= SIZE_MAX - b) {
    *result = a + b;
    return false;
  }
  return true;
}

bool
mul_size_t_overflow (size_t a, size_t b, size_t *result)
{
  if (b == 0 || a <= SIZE_MAX / b) {
    *result = a * b;
    return false;
  }
  return true;
}

(10) The (unsigned) type-generic macros could look something like this.
First, we'd need to check if the types of "a", "b" and
"result" were (i)
integers, (ii) unsigned, and (iii) had identical range. If not, we
should get a compilation failure (static assert, of sorts). Let's assume
at least that none of our integer types have padding bits, and that
signed integer types use two's complement. (Although the signed
representation may be irrelevant in the discussion below.)

I don't consider bit-fields at all because they are pure evil anyway. :)

(i) The "integer" check for any arithmetic type can be done by dividing
((type)1) with 2. Every floating point type can represent 1/2 exactly
(so the result compares uneqal to zero), and for every integer type, the
result is zero.

(ii) The "unsigned" check can be done by seeing if ((type)-1) is
positive. This works even for _Bool, plus -1 is safely expressible by
any signed integer type (IOW even if the check fails, it is safe to
evaluate).

(iii) "identical range" can be expressed by ((type1)-1 == (type2)-1),
with the additional -- perhaps redundant -- restriction (sizeof a ==
sizeof b). This relies on our assumption about integer representation
(no padding bits).

The static assert can be done by intentionally violating a C constraint
(which in turn requires a diagnostic to be emitted), dependent on an
integer constant expression that combines (i), (ii) and (iii). A common
trick is to create an array typedef, where the size of the array is -1
(if the condition we want to enforce fails) vs. +1 (otherwise).

(11) Then we could do this, for example:

#include <stdbool.h>
#include <string.h>

/* This macro does not evaluate any one of its arguments, unless the argument
 * has variably modified type.
 */
#define HAVE_SAME_UINT_TYPES(a, b)     \
  ((typeof (a))1 / 2 == 0 &&           \
   (typeof (b))1 / 2 == 0 &&           \
   (typeof (a))-1 > 0 &&               \
   (typeof (b))-1 > 0 &&               \
   (typeof (a))-1 == (typeof (b))-1 && \
   sizeof a == sizeof b)

/* This macro evaluates each of "a", "b" and "r" exactly
once, assuming neither
 * has variably modified type.
 */
#define ADD_UINT_OVERFLOW(a, b, r)                                      \
  ({                                                                    \
    typedef char static_assert_1[-1 + 2 * HAVE_SAME_UINT_TYPES(a, b)];  \
    typedef char static_assert_2[-1 + 2 * HAVE_SAME_UINT_TYPES(a, *r)]; \
    typeof (a) a2 = a, b2 = b, max = -1, result;                        \
    void *r2 = r;                                                       \
    bool overflow = true;                                               \
                                                                        \
    if (a2 <= max - b2) {                                               \
      result = a2 + b2;                                                 \
      memcpy(r2, &result, sizeof result);                               \
      overflow = false;                                                 \
    }                                                                   \
    overflow;                                                           \
  })

Note that using this would require a bit more pedantry than usual at the
*call sites*; for example, adding constants "1" and "2u" would not
work.

TBH, I think the macros I created above are hideous. I like call sites
to be explicit about types, and so I'd prefer the much simpler;
type-specific add_size_t_overflow() and mul_size_t_overflow() functions.

(Beyond reverting b31859402d14, I'd even replace the type-specific
ADD_UINT64_T_OVERFLOW, MUL_UINT64_T_OVERFLOW, ADD_SIZE_T_OVERFLOW,
MUL_SIZE_T_OVERFLOW function-like macros with actual functions. Just let
the compiler inline whatever it likes.)

Comments?

Thanks
Laszlo

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] [PATCH nbdkit 2/2] common/include/checked-overflow.h: Provide fallback