On Fri, Jun 16, 2023 at 11:17:21AM +0900, Vincent MAILHOL wrote:
Hi Richard,
On Fri. 16 Jun. 2023 à 03:08, Richard W.M. Jones <rjones(a)redhat.com> wrote:
> On Thu, Jun 15, 2023 at 09:18:38PM +0900, Vincent Mailhol wrote:
> > Hello,
> >
> > I am using libguestfs in a Bazel's linux-sandbox environment[1].
> >
> > When executing in that sandbox environment, I got frequent crashes.
> >
> > Please find attached below the results of libguestfs-test-tool when
> > run into that linux-sandbox environment. The most relevant part seems
> > to be:
> >
> > [ 0.797233] ldmtool[164]: segfault at 0 ip 0000564a892506a5 sp
00007fff8ee5b900 error 4 in ldmtool[564a8924e000+3000]
> > [ 0.798117] Code: 18 64 48 33 1c 25 28 00 00 00 75 5e 48 83 c4 28 5b 5d 41
5c 41 5d 41 5e 41 5f c3 66 2e 0f 1f 84 00 00 00 00 00 e8 db fd ff ff <4c> 8b 20 48
89 44 24 08 4c 89 e7 e8 0b e1 ff ff 45 31 c0 4c 89 e1
> > /init: line 154: 164 Segmentation fault ldmtool create all
> >
> > So the root cause seems to be around libldm. This mailing list seems
> > to cover both libguestfs and libldm, so hopefully, I am at the right
> > place to ask :)
> >
> > Needless to say, when run outside of the sandbox environment, no crash
> > were observed.
> >
> > [1] linux-sandbox.cc
> > Link:
https://github.com/bazelbuild/bazel/blob/master/src/main/tools/linux-sand...
> >
> > ---
> ...
> > supermin: picked /sys/block/sdb/dev (8:16) as root device
> > supermin: creating /dev/root as block special 8:16
> > supermin: mounting new root on /root
> > [ 0.678248] EXT4-fs (sdb): mounting ext2 file system using the ext4
subsystem
> > [ 0.679832] EXT4-fs (sdb): mounted filesystem without journal. Opts: . Quota
mode: none.
> > supermin: deleting initramfs files
> > supermin: chroot
> > Starting /init script ...
> > mount: only root can use "--types" option (effective UID is 65534)
> > /init: line 38: /proc/cmdline: No such file or directory
> > mount: only root can use "--types" option (effective UID is 65534)
> > mount: only root can use "--options" option (effective UID is 65534)
> > mount: only root can use "--types" option (effective UID is 65534)
> > mount: only root can use "--types" option (effective UID is 65534)
> > mount: only root can use "--options" option (effective UID is 65534)
>
> It really goes wrong from here, where apparently it's not running as
> root (instead UID 65534), even though we're supposed to be running
> inside a Linux appliance virtual machine.
>
> Any idea why that would be?
>
> I looked at the sandbox and that would run the qemu process as UID
> "nobody" (which might be 65534). However I don't understand why that
> would affect anything running on the new kernel inside the appliance.
And you were right. It was a fact that I got a crash in the sandbox
but did not outside of it and I jumped to the conclusion that the root
cause was linked to the sandbox.
I continued the analysis and looked at all the differences between a
successful libguestfs-test-tool log and the failed one. It turned out
that the sandbox was not the cause. The culprit turns out to be the
first line of the log: TMPDIR=/tmp.
If I force TMPDIR=/var/tmp, the problem disappears !!
This gave me a minimal reproducer:
TMPDIR=/tmp/ libguestfs-test-tool
That one crashed outside the sandbox. Next, my attention went to this line:
libguestfs: checking for previously cached test results of
/usr/bin/qemu-system-x86_64, in /tmp/.guestfs-1001
I did a:
rm -rf /tmp/.guestfs-1001
and that solved my issue \o/
I still do not understand how I could get the issue of running of UID
65534 instead of root in the first place. I did other qemu
experimentation, so not sure how, but I somehow got a corrupted
environment under /tmp/.guestfs-1001.
We will cache the appliance under $TMPDIR/.guestfs-$UID/ (eg have a
look at appliance/root in that directory).
We rebuild it if the distro changes, so most of the time we don't have
to rebuild it when launching libguestfs (although there was a
long-standing bug which I fixed recently:
https://github.com/libguestfs/supermin/commit/8c38641042e274a713a18daf7fc...).
Last thing, the segfault on ldmtool [1] still seems a valid issue.
Even if I now do have a workaround for my problem, that segfault might
be worth a bit more investigation.
Yes that does look like a real problem. Does it crash if you just run
ldmtool as a normal command, nothing to do with libguestfs? Might be
a good idea to try to get a stack trace of the crash.
Rich.
Regardless, thanks a lot for your quick answer, that helped me to
continue the troubleshooting.
[1] ldmtool line 164
Link:
https://github.com/mdbooth/libldm/blob/master/src/ldmtool.c#L164
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html