On 6/19/23 13:18, Vincent MAILHOL wrote:
On Fri. 16 juin 2023 at 16:34, Richard W.M. Jones
<rjones(a)redhat.com> wrote:
(...)
>> Last thing, the segfault on ldmtool [1] still seems a valid issue.
>> Even if I now do have a workaround for my problem, that segfault might
>> be worth a bit more investigation.
>
> Yes that does look like a real problem. Does it crash if you just run
> ldmtool as a normal command, nothing to do with libguestfs? Might be
> a good idea to try to get a stack trace of the crash.
The fact is that it only crashes with the UUID 65534 in the qemu VM. I
am not sure what command line is passed to ldmtool for this crash to
occur.
I can help to gather information, but my biggest issue is that I do
not know how to interact with the VM under /tmp/.guestfs-1001/
[ 0.777352] ldmtool[164]: segfault at 0 ip 0000563a225cd6a5 sp
00007ffe54965a60 error 4 in ldmtool[563a225cb000+3000]
^^^^ ^^^^^^^^^^^^^^^^^^^
This smells like a NULL pointer dereference.
... Hey this is actually my line from an email I started writing earlier
today :) , but I then decided not to send it.
It certainly looks like a null pointer dereference, and if you
disassemble the instruction byte stream dump (the "Code:" line from the
kernel log) with (e.g.) ndisasm, that confirms it. You get something like
00000025 E8DBFDFFFF call 0xfffffffffffffe05
0000002A 4C8B20 mov r12,[rax] <---- crash
0000002D 4889442408 mov [rsp+0x8],rax
00000032 4C89E7 mov rdi,r12
00000035 E80BE1FFFF call 0xffffffffffffe145
with the "mov r12,[rax]" instruction faulting (with the previously
called function presumably having returned 0 in rax). See the "<4c> 8b
20" substring in the "Code:" line -- the angle brackets point at the
first byte of the crashing instruction.
I didn't send the email ultimately because your email included a link
[1] pointing at a particular line number:
https://github.com/mdbooth/libldm/blob/master/src/ldmtool.c#L164
and so I assumed you actually traced the crash to that line.
Is that the case?
Or did you perhaps mistake *PID* 164 (from the kernel log) for the line
number?
The instruction pointer
being 563a225cd6a5, I installed libguestfs-tools-dbgsym and tried a:
addr2line -e /usr/bin/ldmtool 564a892506a5
Results:
??:0
Without conviction, I also tried in GDB:
$ gdb /usr/bin/ldmtool
(...)
Reading symbols from /usr/bin/ldmtool...
Reading symbols from
/usr/lib/debug/.build-id/21/37b4a64903ebe427c242be08b8d496ba570583.debug...
(gdb) info line *0x564a892506a5
No line number information available for address 0x564a892506a5
Debug symbols are correctly installed but impossible to convert that
instruction pointer into a line number. It is as if the ldmtool on my
host and the ldmtool in the qemu VM were from a different build. I
tried to mount /tmp/.guestfs-1001/appliance.d/root but that disk image
did not contain ldmtool.
I am not sure how to generate a stack trace or a core dump within that
qemu VM. If you can tell me how to get an interactive prompt (or any
other guidance) I can try to collect more information.
The IP where the crash occurs is 0000563a225cd6a5. The ldmtool binary
(as opposed to a shared object / library) is mapped into the process's
address space at 563a225cb000, for a length of 0x3000 bytes. So the
offending instruction is supposed to be 0000563a225cd6a5 - 563a225cb000
= 26A5.
With the debug symbols installed, can you attach the output of
objdump --headers --wide -S /usr/bin/ldmtool
?
Can you try
addr2line -p -i -f -e /usr/bin/ldmtool 26A5
?
(This still may not be good enough; we might have to offset the
difference 0x26A5 with some address related to the .text section... The
objdump output should help us experiment.)
Laszlo