On 2/22/23 09:09, Richard W.M. Jones wrote:
On Tue, Feb 21, 2023 at 05:03:23PM +0100, Laszlo Ersek wrote:
> Rich mentioned that libnbd had actually encountered a bug of this kind,
> just not specifically in exec*p*().
Probably this one?
https://github.com/libguestfs/libguestfs/commit/e1c9bbb3d1d5ef81490977060...
Yes, that's the one you mentioned before! Here:
http://mid.mail-archive.com/20230131130753.GA7636@redhat.com
glibc was pretty tolerant of this code bug, and the error only
manifested itself when we used glibc.malloc.check=1
Thank you for pointing me to the same commit again.
Yesterday, when Daniel described that malloc() was -- in practice -- safe to call in a
child process forked from a multi-threaded process, I wrote the following test program:
(The program starts 8 threads calling malloc+free in a busy loop, then the main thread
enters an infinite loop, forking and reaping a child process in each iteration, and
printing a dot for each child reaped. The child process, forked from the multi-threaded
parent process, calls a single malloc+free pair, and then exits.)
1 #define _XOPEN_SOURCE 700
2
3 #include <pthread.h>
4 #include <stdlib.h>
5 #include <sys/wait.h>
6 #include <unistd.h>
7
8 static const size_t size = 16 * 1024 * 1024;
9
10 static void *
11 threadfn (void *arg)
12 {
13 while (1)
14 free (malloc (size));
15 }
16
17 int
18 main (void)
19 {
20 unsigned i;
21
22 for (i = 0; i < 8; ++i) {
23 pthread_t thread;
24
25 if (pthread_create (&thread, NULL, threadfn, NULL) != 0)
26 _exit (EXIT_FAILURE);
27 }
28
29 while (1) {
30 pid_t pid;
31
32 pid = fork ();
33 switch (pid) {
34 case -1:
35 _exit (EXIT_FAILURE);
36
37 case 0:
38 /* child */
39 free (malloc (size));
40 _exit (EXIT_SUCCESS);
41
42 default:
43 /* parent */
44 if (waitpid (pid, NULL, 0) == -1 ||
45 write (STDOUT_FILENO, ".", 1) == -1)
46 _exit (EXIT_FAILURE);
47 }
48 }
49 }
To my shock, the program ran totally fine (on RHEL-9.1), producing a constant stream of
dots on standard output, proving Daniel *right*.
However, the commit message you now reference highlights "GLIBC_TUNABLES
glibc.malloc.check=1". Doing a web search for that, I'm led to
https://www.gnu.org/software/libc/manual/html_node/Memory-Allocation-Tuna...
which states that this tunable actually depends on pre-loading
"libc_malloc_debug".
So, if I re-run the program like this:
$ LD_PRELOAD=/usr/lib64/libc_malloc_debug.so.0 \
./malloc-test
then it continues running; if I re-run it like this:
$ GLIBC_TUNABLES=glibc.malloc.check=1 \
./malloc-test
then it continues running; but if I re-run it like *this*:
$ LD_PRELOAD=/usr/lib64/libc_malloc_debug.so.0 \
GLIBC_TUNABLES=glibc.malloc.check=1 \
./malloc-test
then it *instantly* deadlocks; it doesn't print a single dot.
According to gdb, Thread 1 of the parent process is blocked in waitpid() on line 44, the
other threads of the parent process are executing threadfn() -- I can see that on my CPU
load indicator too --, and the child process is deadlocked in malloc():
#0 0x00007ffbae0934fb in __lll_lock_wait_private () from /lib64/libc.so.6
#1 0x00007ffbae218f38 in malloc_check () from /usr/lib64/libc_malloc_debug.so.0
#2 0x00007ffbae219c05 in malloc () from /usr/lib64/libc_malloc_debug.so.0
#3 0x000000000040121a in main () at malloc-test.c:39
(Yes, I understand that libc_malloc_debug is not meant for production use; still...)
Laszlo