It has long been known that the C specification of *scanf() leaves
behavior undefined for things like
int i;
sscanf("9999999999999999", "%i", &i);
C11 7.21.6.2 P12
"Matches an optionally signed integer, whose format is the same as
expected for the subject sequence of the strtol function with the value
0 for the base argument."
C11 7.21.6.2 P10
"If this object does not have an appropriate type, or if the result of
the conversion cannot be represented in the object, the behavior is
undefined."
as there is an overflow when consuming the input which matches the
strtol subject sequence but does not fit in the width of an int. On my
Linux system, 'man sscanf' mentions that ERANGE might be set in such a
case, but neither C nor POSIX actually requires this behavior; other
likely behaviors is storing the value mod 2^32 into i, or storing
INT_MAX into i, or ...
This is annoying - the only safe way to parse integers from
untrustworthy sources, where overflow MUST be detected, is to manually
open-code strtol() calls, which can get quite lengthy in comparison to
the concise representations possible with *scanf.
Would glibc be willing to consider a GNU extension to add an optional
flag character between '%' and the various numeric conversion specifiers
(both integral based on strto*l, and floating point based on strtod),
where we could force *scanf to treat numeric overflow as a matching
failure, rather than undefined behavior? Or even a second flag to
request that printf stop consuming characters if the next character in
input would cause overflow in the current specifier, leaving that
character to instead be matched to the remainder of the format string?
Let's suppose for arguments that we add '^' as a request to force
overflow to be a matching error. Then sscanf("9999999999999999",
"%^i",
&i) would be well-specified to return 0, rather than returning 1 with an
unknown value assigned into i or any other behavior that other libc do
with the undefined behavior when the ^ is not present.
And if glibc likes the idea of such an extension, and we see an uptick
in applications actually using it, I'd also be happy to champion the
addition of such an extension in POSIX (but the POSIX folks will
definitely want to see existing practice first - both an implementation
and applications that use that implementation). The libguestfs suite of
programs is willing to be an early adopter, if glibc is willing to
pursue adding such a safety valve.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization:
qemu.org |
libvirt.org