Re: [Libguestfs] [PATCH libnbd] generator: Pass LISTEN_FDNAMES=nbd with systemd socket activation

Sunday, 5 February 2023

On 2/1/23 17:17, Laszlo Ersek wrote:
...
 On 1/31/23 18:19, Eric Blake wrote: 
...
> The alternative to relying on execvp() to scan PATH is to
pre-scan
> PATH ourselves before fork().  I wish there were a helper function in
> glibc that would quickly return the absolute path that execvp() would
> otherwise utilize.

 We use execvp() in another spot -- "generator/states-connect.c".

 I'll try to come up with some new nbd_internal_fork_safe_*() APIs, for
 preparing and then "running" execvp. 
There's a problem.

getenv() is not (generally) thread-safe -- even if we called
getenv("PATH") before forking.

  https://pubs.opengroup.org/onlinepubs/9699919799/functions/getenv.html

An application may call a libnbd interface from one of its threads, and
call getenv() -- or setenv(), or unsetenv(), or putenv() -- from another
one of its threads. Because there is no synchronization around
*existent* getenv() calls in libnbd (LOGNAME, HOME, LIBNBD_DEBUG), this
would lead to undefined behavior.

We can't robustly pre-fetch these environment variables in a library
constructor funcion, like errors_init(), for two reasons:

- Client code that manipulates these variables intentionally, would
  break, even if single-threaded. Example: "tests/debug-environment.c".

- dlopen() and dlsym() are thread-safe functions (all POSIX functions
  are, unless explicitly noted otherwise

<https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#...>).
  If an application were to dlopen() libnbd, it could expect dlopen() to
  be thread-safe, and so no libnbd constructor function within would be
  permitted to call thread-unsafe functions.

A robust solution would call for an explicit libnbd initialization
function (where libnbd would snapshot the relevant environment variables
into global variables, and the application would be told, by way of
documentation, that this would have to be done before any threads are
created). But that breaks compatibility with existent libnbd
applications.

Another robust solution would be for libnbd to:

- expose a global mutex called "env_mutex" (possibly wrapped into an
  nbd_ accessor API, such as "nbd_lock_environment" or something),

- acquire the mutex around all getenv() and similar calls,

- document for any client application that any environment accesses
  anywhere in the application must be protected by this lock. In
  particular, because getenv() my *modify* internal state, its return
  value must be copied before releasing the mutex.

This is per POSIX:

...
 The returned string pointer might be invalidated or [...] the string
 content might be overwritten by a subsequent call to getenv 
Linux/glibc is a bit more helpful:
<https://man7.org/linux/man-pages/man3/getenv.3.html> calls getenv()
"MT-Safe env", and that is explained as follows
<https://man7.org/linux/man-pages/man7/attributes.7.html>;:

...
        env    Functions marked with env as an MT-Safety issue access
               the environment with getenv(3) or similar, without any
               guards to ensure safety in the presence of concurrent
               modifications.

               We do not mark these functions as MT-Unsafe, however,
               because functions that modify the environment are all
               marked with const:env and regarded as unsafe.  Being
               unsafe, the latter are not to be called when multiple
               threads are running or asynchronous signals are enabled,
               and so the environment can be considered effectively
               constant in these contexts, which makes the former safe. 
IOW the argument on Linux/glibc is that "getenv() will not break
getenv(), because it does not modify anything".

It still needs to be protected from writes: setenv() is marked as
"MT-Unsafe const:env"
<https://man7.org/linux/man-pages/man3/setenv.3.html>;, and see the
following excerpt about const:env", again from attributes(7):

...
        const  Functions marked with const as an MT-Safety issue non-
               atomically modify internal objects that are better
               regarded as constant, because a substantial portion of the
               GNU C Library accesses them without synchronization.
               Unlike race, which causes both readers and writers of
               internal objects to be regarded as MT-Unsafe, this mark is
               applied to writers only.  Writers remain MT-Unsafe to
               call, but the then-mandatory constness of objects they
               modify enables readers to be regarded as MT-Safe (as long
               as no other reasons for them to be unsafe remain), since
               the lack of synchronization is not a problem when the
               objects are effectively constant.

               The identifier that follows the const mark will appear by
               itself as a safety note in readers.  Programs that wish to
               work around this safety issue, so as to call writers, may
               use a non-recursive read-write lock associated with the
               identifier, and guard all calls to functions marked with
               const followed by the identifier with a write lock, and
               all calls to functions marked with the identifier by
               itself with a read lock. 
I don't like it:

- it requires the same amount of work in the source code (both libnbd
  and client apps),

- readers excluding readers is safer per POSIX, where getenv() may be
  entirely MT-Unsafe,

- the environment is not a highly contended resource, so the performance
  impact of getenv() excluding getenv() is likely negligible.

Summary:
- ignore the whole thing?
- update the documentation, add locking APIs, implemented with a mutex?
- update the documentation, add locking APIs, implemented with an
  rwlock?

Please comment,
Laszlo

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] [PATCH libnbd] generator: Pass LISTEN_FDNAMES=nbd with systemd socket activation