Re: [Libguestfs] [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe()

Wednesday, 22 February 2023

Sorry about replying for the second time. After having slept on it (not
much, but some), some thoughts are emerging / being distilled about my
own attitude.

On 2/21/23 19:04, Daniel P. Berrangé wrote:
...
 On Tue, Feb 21, 2023 at 06:53:39PM +0100, Laszlo Ersek wrote:
>
> More in general, this lesson tells me that POSIX is effectively
> irrelevant -- which is quite sad in itself; the bigger problem however
> is that *nothing replaces it*. If the one formal standard we have for
> portability does not reflect reality closely enough, and we need to rely
> on personal experience with various platforms, then we're back to where
> we were *before* POSIX. That is, having to check several separate
> documentation sets, and testing each API on every relevant platform in
> *each project* where the API is used. The idea is "ignore POSIX, care
> about Linux / modern systems only", but then it turns out those modern
> systems *do* differ sufficiently that extracting a common programming
> base *would* be useful. It's just that POSIX is not that common base;
> more precisely, there is no formalized, explicit common base. I guess
> "whatever passes CI" is the common base. That's... terrible, and it
> makes me seriously question if I want to program userspace in C at all.

 FWIW, I wouldn't say that POSIX is irrelevant in general. If you
 are trying to maximimse portability it is worth paying attention to.

 Rather I'd say that maintainers of projects may be opinionated about
 which platforms they wish to support, to eliminate the burden of caring
 about platforms which have few if any users in the modern world.

 In libvirt and QEMU context we set explicit platform support targets:

   https://libvirt.org/platforms.html
   https://www.qemu.org/docs/master/about/build-platforms.html

 which effectively limit us to only care about actively developed
 OS from the last ~4 years, and even then only fairly mainstream
 stuff. We don't care about a hobbyist/toy UNIX OS. The burden is
 on other OS to attain compatibility with mainstream modern OS,
 not for apps to adapt to osbscure feature-poor platforms. 

 With this attitude, we don't care about compliance with countless
 obsolete vendor's UNIX platforms, and thus many of the edge cases
 that POSIX worries about can be ignored. This frees the project
 maintainers time to focus on work that benefits a broader set of
 users.

 From this, libvirt/QEMU could both explicitly decide to not care
 about any C compilers other than CLang/GCC.  Vendor compilers and
 most especially MSVC are out of scope. CLang/GCC are able to support
 any of the OS platforms we target. This frees us from caring about
 ISO C standards, letting us use GNU extensions. 
The attitude you describe above and my attitude are largely driven by
the same goal: target development as narrowly as possible.

Portability is essential in both cases; the big difference is in the
workflow chosen for achieving portability.

Approach #1: A number of OSes and a number of tools (compilers etc) are
hand-picked, based on "practical" factors. This set of components taken
as a whole does not have uniform, central documentation. Therefore,
development is driven by (a) continuously consulting multiple -- often
conflicting -- sets of documentation, and by (b) trial-and-error. By
"trial-and-error" I mean that a "CI pass" is taken as strong evidence
of
absence of bugs, including portability bugs. The workflow relies heavily
on CI to root out portability bugs.

The advantage of this approach is that it deduces -- with documentation
reconciliation, trial-and-error, and compiler / OS / libc source code
investigation -- such a "common denominator" that is fairly likely the
*greatest* common denominator. Therefore less/simpler code has to be
written and maintained for feature and bugfix delivery.

The disadvantage is that there is no single source of truth; the
workflow is centered on reconciling incomplete and/or conflicting
documentation sets, and "happens to work in CI" is taken as the final
argument. CI is costly in computing time (energy), developer time
(waiting, bad presentation of results), and money (minutes are
expensive), and locally testing all targeted platforms is a huge chore.
CI development/management in itself consumes immense human effort.

Approach #2: target a published technical standard, as a single source
of truth. Still employ CI, but not as a guiding tool, more like "just in
case". CI failures originating from portability issues are not expected
by general. CI success is not taken as the primary evidence of lack of
portability bugs.

The advantage of this approach is that developers can focus on a single
source of truth, for driving development -- POSIX. Patching up
portability problems may occasionally be necessary, but that should be
the exception.

The disadvantage of this approach is that POSIX, while arguably a common
denominator, is almost certainly not the *greatest* common denominator.
Therefore, more code needs to be written and maintained, plus recent
developments that "eventually" appear in all of the targeted platforms /
tools, are not consumable until they become centrally standardized.

So, here's the thing: at a personal level, I can entirely identify with
approach #2, and I'm unable to identify with approach #1, as the
development workflow that I am supposed to follow and practice. To me,
being torn to pieces between 3-4 conflicting documentation sets, and
writing code such that the *primary metric* be "let me see if this
passes CI -- let me throw code at the wall and see what sticks" is
unbearable. Having to submit several tweaks in succession and waiting
tens of minutes for CI to finish every time, rules out software
development as a profession for me. (CI remains relevant anyway, but not
for dictating or driving portability decisions.) Having to test out
interfaces manually that are supposed to be standard, for determining
and exploiting their *accidental* greatest common denominator, again
rules out software development for me as a profession.

Such work *is* valuable, but it's called standardization / standards
development, not software development. I don't mind participating in
standards development, but the *output* of that activity needs to be a
*central formal standard* that programmers can rely upon in the future,
not some implicit understanding that gets encoded in / dispersed over a
bunch of disparate applications and libraries -- such as "we can call
execvp() here because our particular fork() version lets us" -- that
merely happen to target the same arbitrary set of platforms.

QEMU actually gets this *quite* right, with "devel/style.rst". It still
doesn't say anything about fork()/execvp() though, for example.

On the same note, I honestly think that the conflict between the linux
manual pages and the GNU manual, regarding the child process
restrictions, is *unforgivable*.

Note that I'm not trying to assert an objective truth here. All I'm
saying is I'm personally incompatible with approach#1. To me, *how* I
work is generally more important than *what* I achieve for users. Under
that umbrella, the justification for introducing our own
async-signal-safe execvpe in this patch is simply the fact that the
official documentations (plural) available on Linux are *inconsistent*
about fork()+execvp(). The fact that it "happens" to work in practice is
just happenstance. If you will, call this my denial of practical reality.

So: if the libnbd project can tolerate my attitude (approach#2), then
I'd like to proceed with this series (full scope), with me addressing
the v3 review feedback in v4, and so on. If not, then I'll abandon the
series, and try to make myself useful with something else -- where my
basic stance, towards whatever documentation I read, need not be *distrust*.

Laszlo

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe()