On 9/5/23 16:57, Eric Blake wrote:
On Tue, Sep 05, 2023 at 11:09:02AM +0100, Richard W.M. Jones wrote:
>>> +static inline int64_t
>>> +human_size_parse (const char *str,
>>> + const char **error, const char **pstr)
>>> +{
>>> + int64_t size;
>>> + char *end;
>>> + uint64_t scale = 1;
>>> +
>>> + /* XXX Should we also parse things like '1.5M'? */
>>> + /* XXX Should we allow hex? If so, hex cannot use scaling suffixes,
>>> + * because some of them are valid hex digits.
>>> + */
>>> + errno = 0;
>>> + size = strtoimax (str, &end, 10);
>>
>> (1) A further improvement here (likely best done in a separate patch)
>> could be to change the type of "size" to "intmax_t", from
"int64_t".
>> That way, the assignment will be safe even theoretically, *and* the
>> overflow check at the bottom of the function (with the division &
>> comparison of the quotient against INT_MAX) will work just the same.
>
> I'm always very unsure how this all works. In particular I seem to
> recall that intmax_t is no longer really the maximum possible int
> (because of int128) and so it's always 64 bit on platforms we care
> about. Can Eric comment?
intmax_t was supposed to be whatever the compiler supports as its
largest integer type; but you are right that when gcc added
__int128_t, they did NOT change intmax_t at the time (arguably by
using weasel-words that as an implementation-defined type, it was not
an integer type merely because you can't write integer literals of
that type, even though it behaves integral in every other aspect).
We're kind of in a frozen state where making intmax_t larger than 64
bits will break more programs than expected because it has ABI
implications:
https://stackoverflow.com/questions/21265462/why-in-g-stdintmax-t-is-not-...
The fact (which I just learned today) that ABIs are unable to keep up
with intmax_t is a disaster, and entirely defeats the purpose of intmax_t.
My personal preference is to avoid intmax_t, as it has too much
baggage (the risk of future widening, vs. NOT being the widest type
after all), similar to how size_t already has baggage. In short,
having something that is not platform specific is easier to reason
about (for the same way that using size_t gives me more grief than
directly using int32_t or int64_t; even though size_t is such a
naturally occuring type, the fact that it is not uniform width makes
it trickier to work with).
Very painful disconnect between platforms (which prefer fixed size
integers) and the C standard (which defines all the rules with standard
integer types).
>>> +
>>> + if (INT64_MAX / scale < size) {
>>> + *error = "could not parse size: size * scale overflows";
>>> + *pstr = str;
>>> + return -1;
>>> + }
And thus I prefer that this comparison stay pegged to INT64_MAX, and
not INT_MAX.
Side comment: I never suggested INT64_MAX be replaced with INTMAX_MAX
here, and that was deliberate on my part. I only suggested changing the
type of "size" from "int64_t" to "intmax_t", so that
"size"'s type would
literally match the retval type of strtoimax().
And after such a type change, the expression (INT64_MAX / scale < size)
would remain exactly right:
- All the participating values remain nonnegative, so whatever usual
arithmetic conversions are taken, the values will never change.
- Keeping the INT64_MAX limit continues to make sure that the final
(size * scale) product, although its type *might* change, will produce
the same safe value, for returning through an int64_t. (We'd not want
the retval type to change!)
... Ahhh! I now see where the confusion comes from. My mistake! I wrote:
That way, the assignment will be safe even theoretically, *and* the
overflow check at the bottom of the function (with the division &
comparison of the quotient against INT_MAX) will work just the same.
That's a terrible typo! I meant to write INT64_MAX!
Laszlo