On 5/16/23 15:26, Richard W.M. Jones wrote:
How many of the tests fail for you? Just a small number or all of
them?
Almost all of them fail.
I think I've figured out why.
First, as I mention up-thread, there's upstream glibc bug
<
https://sourceware.org/bugzilla/show_bug.cgi?id=28256>, reported by
you, fixed in 2.35; but the upstream fix has never been backported to
RHEL-9.
However, there's another piece to the puzzle. In the upstream glibc bug
report, you wrote:
"I'm getting this when I run any program under valgrind with glibc
tunables"
Keyword being "glibc tunables". I don't set them myself -- so why does
my (unfixed) glibc nonetheless trigger the valgrind false positive?
The answer to that is the following: I build all libguestfs projects
from source.
I keep referring to the following dependency graph (constructed earlier
with your help):
libvirt-ocaml ---------
\
libnbd <--> nbdkit--- \
\ \
hivex ----> libguestfs --------------------> virt-v2v -> virt-p2v
/ \ /
supermin -- -> guestfs-tools -
(It does not show augeas, which I cannot be bothered to build locally.)
Accordingly, I have a good number of shell scripts that are called:
r-guestfs-tools
r-libguestfs
r-nbdkit
r-virt-p2v
r-virt-v2v
For example, "r-guestfs-tools" looks like this:
#!/bin/bash
# enable dependencies needed by guestfs-tools
SUPERMIN=$HOME/src/v2v/supermin/src/supermin \
$HOME/src/v2v/hivex/run \
$HOME/src/v2v/libguestfs/run \
"$@"
and, for example, "r-virt-v2v" is:
#!/bin/bash
# enable dependencies needed by virt-v2v
SUPERMIN=$HOME/src/v2v/supermin/src/supermin \
$HOME/src/v2v/hivex/run \
$HOME/src/v2v/libguestfs/run \
$HOME/src/v2v/libnbd/run \
$HOME/src/v2v/libvirt-ocaml/run \
$HOME/src/v2v/guestfs-tools/run \
"$@"
And when I build virt-v2v locally, I do:
r-virt-v2v autoreconf -i
r-virt-v2v ./configure CFLAGS=-fPIC --enable-werror=yes --prefix=/usr
r-virt-v2v make -j6
r-virt-v2v make -j6 check
r-virt-v2v make -j6 check-valgrind
In effect this chains the "run" scripts from all the other local build
trees that virt-v2v depends upon, for building.
Note that virt-v2v's own run script is not chained by my own
"r-virt-v2v" script -- that run script is only needed if someone wants
to run (i.e., not build) virt-v2v locally. (In fact, when I'm gearing up
to autoreconf & configure the virt-v2v tree, the "run" script doesn't
even exist in that tree (only configure will generate it!), so I
couldn't even run it from "r-virt-v2v"!)
Therefore, whenever I also run virt-v2v locally, I spell out the local,
now-existent, "./run" in addition, from the virt-v2v project root:
r-virt-v2v ./run virt-v2v ...
In effect this chains virt-v2v's own run script in addition to the run
scripts of its dependencies.
Now here's the problem. Consider virt-v2v's own "run.in" script:
# This is a cheap way to find some use-after-free and uninitialized
# read problems when using glibc. But if we are valgrinding then
# don't use this because it can stop valgrind from working.
if [ -z "$VG" ]; then
random_val="$(@AWK@ 'BEGIN{srand(); print 1+int(255*rand())}' <
/dev/null)"
LD_PRELOAD="${LD_PRELOAD:+"$LD_PRELOAD:"}libc_malloc_debug.so.0"
GLIBC_TUNABLES=glibc.malloc.check=1:glibc.malloc.perturb=$random_val
export LD_PRELOAD GLIBC_TUNABLES
fi
Ouch. GLIBC_TUNABLES are known to break valgrind. Splendid.
It turns out however that virt-v2v's own "run" script does not
participate in "make check-valgrind". I verified that by adding an
"else" branch above, printing an error message, and exiting with status
1. It does not fire. So this is all fine: the above safety check is for
running virt-v2v *manually* under valgrind. "make check-valgrind" sets
VG, but it does not call virt-v2v's own "run", so the VG nullity check
is not even necessary to reach in the "run" script, for "make
check-valgrind". And the check handles manual VG settings properly.
But... remember "r-virt-v2v" again:
#!/bin/bash
# enable dependencies needed by virt-v2v
SUPERMIN=$HOME/src/v2v/supermin/src/supermin \
$HOME/src/v2v/hivex/run \
$HOME/src/v2v/libguestfs/run \
$HOME/src/v2v/libnbd/run \
$HOME/src/v2v/libvirt-ocaml/run \
$HOME/src/v2v/guestfs-tools/run \
"$@"
It turns out that "guestfs-tools/run" has the exact same logic for
setting GLIBC_TUNABLES! So when I execute
r-virt-v2v make -j6 check-valgrind
then the environment for "make -j6 check-valgrind" will *inherit* a
GLIBC_TUNABLES variable, from (at least!) "guestfs-tools/run". The VG
variable will only be set internally to "make check-valgrind", which is
too late; it does not prevent "guestfs-tools" from setting
GLIBC_TUNABLES. I've verified this in the output of
r-virt-v2v env
which does show GLIBC_TUNABLES.
And that way I hit glibc bug
<
https://sourceware.org/bugzilla/show_bug.cgi?id=28256>.
Now here's another interesting difference:
- the "run" script in guestfs-tools, virt-p2v, and virt-v2v (1) don't
touch GLIBC_TUNABLES when valgrinding, and (2) set GLIBC_TUNABLES when
not valgrinding,
- whereas the "run" script in libnbd (which I also chain in
"r-virt-v2v", at an earlier stage, see above) (1) *unsets*
GLIBC_TUNABLES when valgrinding, and (2) doesn't touch GLIBC_TUNABLES
when not valgrinding. See libnbd commit 2eeb0c693ce1 ("tests: Remove
GLIBC_TUNABLES when running under valgrind", 2021-08-26) -- it even
references the same glibc bug.
Either way, if I use "env -u" to unset LD_PRELOAD and GLIBC_TUNABLES
between the "chain of run scripts" and "make check-valgrind", as in:
r-virt-v2v \
env -u LD_PRELOAD -u GLIBC_TUNABLES \
make -j6 check-valgrind TESTS=test-v2v-fedora-luks-on-lvm-conversion.sh
then the test (test-v2v-fedora-luks-on-lvm-conversion.sh) passes;
valgrind doesn't complain.
Therefore, IMO, this is a bug in how the "run" scripts compose.
Arguably, it should be possible to chain any number of those run scripts
(it's a valid use case for a user to depend on all of the build trees at
the same time), and they should all agree about GLIBC_TUNABLES. Namely,
the run scripts should neither set, nor unset, GLIBC_TUNABLES and
LD_PRELOAD; those variables should be ignored altogether in the "run"
scripts.
Should I submit patches to remove the LD_PRELOAD and GLIBC_TUNABLES
tweaking from all the run scripts?
Thanks!
Laszlo