On 11/09/21 21:54, Richard W.M. Jones wrote:
On Tue, Nov 09, 2021 at 07:27:12PM +0000, Richard W.M. Jones wrote:
> On Tue, Nov 09, 2021 at 08:53:28PM +0200, Nir Soffer wrote:
>> I would try to extract the code to compute the new capacity into a helper
>> function:
>>
>> if (next_capacity(v-cap, n, itemsize, &newcap))
>> return -1;
>>
>> This function can return early instead of jumping around or fail
>> if we cannot reserve n items. In the worst case this function will
>> only hide the overflow macros.
>
> OK
While I think this is a good idea, when I tried to make it work the
function wasn't very elegant. The problem is trying to make the
output "atomic", ie. only updating *newcap once.
We could always use a temporary (local) variable for that, and only
assign it to the output parameter at the very end. (Either way, if the
helper function fails, it's OK to have the output param(s) with
indeterminate value. Assuming we document that.) The compiler can still
keep the local variable in a register.
What was worse, reading the disassembly Clang managed to produce
something that was less efficient, even though it inlined the function.
I think clarity / safety around integer overflows beats "performance of
generated assembly", even in a hot path (which I understand this
function *not* to be in). The C source code we end up including here
should remain undisturbed for a long time; the generated assembly will
change as compiler versions come and go. The patch does use compiler
builtins, so we've done the expected (= trusted the compiler with
generating the best possible assembly).
(Trying to steer assembly output via C code manipulation (let alone
OCaml code manipulation) is something I don't understand in general. For
how many architectures are we willing to eyeball and tweak the generated
assembly? How stable is the generated assembly over different versions
of the same compiler? If the assembly code is so important, we should
*code* the logic in assembly; perhaps by involving arch-specific
assemblers in the makefiles, or by using inline assembly with #ifdefs.
(Some projects do the former regularly, for example OpenSSL and edk2.))
Thanks,
Laszlo