Hello Laszlo,
Thank you for the rundown. I enabled the
additional LIBGUESTFS_BACKEND_SETTINGS, and I have attached a follow up to
the libguestfs-test-tool output.
I also checked out my CPU settings (cat /proc/cpuinfo output attached), and
the host does appear to support PCLMULQDQ (AMD Ryzen 7 5700X). I also
checked the cpuinfo in one of the guests I have created (Ubuntu 18.04,
unstable due to intermittent kernel panics), and the cpuinfo indicates that
this feature seems to be passed down to my guest as well.
I noticed that the libguestfs-test-tool didn't seem to like the qemu
settings it tried to boot with. So, I went back to basics and built a disk
using qemu-img (qcow2) and utilized qemu-system-x86_64 to do the base
install (Ubuntu 18.04). The resulting image boots and I import the
resulting image with virt-install. However, the GUI/console seems to want
to lock up shortly after boot if I am using virt-tools. The guest seems
more stable when I boot it directly with `qemu-system,` and this may be my
workaround for now.
In virt-tools, I can consistently get a panic on the guest by trying to
enable the qemu-guest-agent: `systemctl enable qemu-guest-agent.`
Unfortunately, I cannot get the full output from that panic (attached). It
would seem that this problem is more than just libguestfs-tools. Is there a
KVM listserv that this might be more appropriate for?
Sincerely,
On Mon, Mar 20, 2023 at 1:31 AM Laszlo Ersek <lersek(a)redhat.com> wrote:
On 3/17/23 16:10, Justin Churchey wrote:
> Hello Everyone,
>
> I was having some difficulties converting OVA images yesterday. At
> first, I thought it may have been a compatibility issue with
> VirtualBox 7.0. However, when I went to run libguestfs-test-tool, it
> began failing with the exact same error as the conversions, which
> leads me to believe the issue may lie with libguestfs and not the
> images themselves.
>
> To test further, I created a fresh install of Ubuntu 22.04, and the
> libguestfs-test-tool seems to fail with the same error, even on a
> fresh install. I am attaching the libguestfs-test-tool output for
> reference.
>
> Ubuntu 22.04 is running libguestfs-tools 1.46.2-10ubuntu3
>
> If anybody has any insight into the issue, or if you feel a bug report
> needs to be filed, please let me know.
Your appliance kernel crashes.
Here's my theory on why this might happen, based on your log.
The guestfish appliance runs with KVM acceleration.
The crash happens after/while inserting the modules crc32-pclmul.ko and
crct10dif-pclmul.ko.
The "pclmul" in the names of those modules indicates that these modules
calculate various (crc32) checksums with the PCLMULQDQ instruction. I
believe that PCLMULQDQ is an advanced / accelerated instruction and not
all CPUs may support it.
Your appliance guest is started with "-cpu max" on the QEMU command line
(from libguestfs commit 30f74f38bd6e, "appliance: Use -cpu max.",
2021-01-28). This is probably why the appliance kernel thinks PCLMULQDQ
is available.
I think the PCLMULQDQ instruction may cause an issue here. I don't know
why it misbehaves under KVM, but that's my suspicion anyway.
Note that the kernel crash log provides the following instruction
(assembly binary) dump:
46 70 48 8b 56 68 48 03 97 90 01 00 00 48 c1 e0 06 48 03 46 20 48 89 97
08 02 00 00 48 be ab aa aa aa aa aa aa aa 48 8b 48 10 <48> 89 0a 48 8b
50 20 48 8b 8f 08 02 00 00 48 89 d0 48 f7 e6 48 c1
with the instruction starting at <48> causing the page fault, as the
direct symptom. Now, we can disassemble this:
printf \
'%b' \
'\x46\x70\x48\x8b\x56\x68\x48\x03\x97\x90\x01\x00\x00\x48\xc1\xe0\x06\x48\x03\x46\x20\x48\x89\x97\x08\x02\x00\x00\x48\xbe\xab\xaa\xaa\xaa\xaa\xaa\xaa\xaa\x48\x8b\x48\x10\x48\x89\x0a\x48\x8b\x50\x20\x48\x8b\x8f\x08\x02\x00\x00\x48\x89\xd0\x48\xf7\xe6\x48\xc1'
\
> bin
$ ndisasm -b64 bin
00000000 467048 jo 0x4b
00000003 8B5668 mov edx,[rsi+0x68]
00000006 48039790010000 add rdx,[rdi+0x190]
0000000D 48C1E006 shl rax,byte 0x6
00000011 48034620 add rax,[rsi+0x20]
00000015 48899708020000 mov [rdi+0x208],rdx
0000001C 48BEABAAAAAAAAAA mov rsi,0xaaaaaaaaaaaaaaab
-AAAA
00000026 488B4810 mov rcx,[rax+0x10]
0000002A 48890A mov [rdx],rcx <----------- crash
0000002D 488B5020 mov rdx,[rax+0x20]
00000031 488B8F08020000 mov rcx,[rdi+0x208]
00000038 4889D0 mov rax,rdx
0000003B 48F7E6 mul rsi
0000003E 48 rex.w
0000003F C1 db 0xc1
Note the constant 0xaaaaaaaaaaaaaaab; that seems very special. We can
search the kernel tree for it (I'm not bothering about checking out the
particular ubuntu kernel version for now):
$ git grep -i aaaaaaaaaaaaaaab
arch/x86/math-emu/poly_atan.c:/* 0xaaaaaaaaaaaaaaabLL, transferred to
fixedpterm[] */
arch/x86/math-emu/poly_sin.c: 0xaaaaaaaaaaaaaaabLL,
arch/x86/math-emu/poly_tan.c:static const unsigned long long twothirds =
0xaaaaaaaaaaaaaaabLL;
In particular, in the last file (poly_tan.c) contains a snippet like
mul64_Xsig(&accum, &twothirds);
which seems vagely related to
0000001C 48BEABAAAAAAAAAA mov rsi,0xaaaaaaaaaaaaaaab
-AAAA
...
0000003B 48F7E6 mul rsi
Now this does not seem connected to PCLMULQDQ, but it does somehow look
connected to multiplication.
I don't really know where to go with this, except for asking KVM experts.
For now, can you try:
export LIBGUESTFS_BACKEND_SETTINGS=force_tcg
from <
https://libguestfs.org/guestfs.3.html#backend-settings>, and see
if that makes a difference?
Laszlo