On Mon, Oct 12, 2020 at 9:05 AM Yedidyah Bar David <didi(a)redhat.com> wrote:
Hi all,
On Mon, Oct 12, 2020 at 5:17 AM <jenkins(a)jenkins.phx.ovirt.org> wrote:
>
> Project:
https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/
> Build:
https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_night...
Above failed with:
https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_night...
vdsm.log has:
https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_night...
2020-10-11 22:05:14,695-0400 INFO (jsonrpc/1) [api.host] FINISH
getJobs return={'jobs': {'05eaea44-7e4c-4442-9926-2bcb696520f1':
{'id': '05eaea44-7e4c-4442-9926-2bcb696520f1', 'status':
'failed',
'description': 'sparsify_volume', 'job_type': 'storage',
'error':
{'code': 100, 'message': 'General Exception: (\'Command
[\\\'/usr/bin/virt-sparsify\\\', \\\'--machine-readable\\\',
\\\'--in-place\\\',
\\\'/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share1/8b292c13-fd8a-4a7c-903c-5724ec742c10/images/a367c179-2ac9-4930-abeb-848229f81c97/515fcf06-8743-45d1-9af8-61a0c48e8c67\\\']
failed with rc=1 out=b\\\'3/12\\\\n{ "message": "libguestfs error:
guestfs_launch failed.\\\\\\\\nThis usually means the libguestfs
appliance failed to start or crashed.\\\\\\\\nDo:\\\\\\\\n export
LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1\\\\\\\\nand run the command
again. For further information, read:\\\\\\\\n
http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs\\\\\\\\nYou
can also run \\\\\\\'libguestfs-test-tool\\\\\\\' and post the
*complete* output\\\\\\\\ninto a bug report or message to the
libguestfs mailing list.", "timestamp":
"2020-10-11T22:05:08.397538670-04:00", "type": "error"
}\\\\n\\\'
err=b"virt-sparsify: error: libguestfs error: guestfs_launch
failed.\\\\nThis usually means the libguestfs appliance failed to
start or crashed.\\\\nDo:\\\\n export LIBGUESTFS_DEBUG=1
LIBGUESTFS_TRACE=1\\\\nand run the command again. For further
information, read:\\\\n
http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs\\\\nYou
can also run \\\'libguestfs-test-tool\\\' and post the *complete*
output\\\\ninto a bug report or message to the libguestfs mailing
list.\\\\n\\\\nIf reporting bugs, run virt-sparsify with debugging
enabled and include the \\\\ncomplete output:\\\\n\\\\n virt-sparsify
-v -x [...]\\\\n"\',)'}}}, 'status': {'code': 0,
'message': 'Done'}}
from=::ffff:192.168.201.4,43318,
flow_id=365642f4-2fe2-45df-937a-f4ca435eea38 (api:54)
2020-10-11 22:05:14,695-0400 DEBUG (jsonrpc/1) [jsonrpc.JsonRpcServer]
Return 'Host.getJobs' in bridge with
{'05eaea44-7e4c-4442-9926-2bcb696520f1': {'id':
'05eaea44-7e4c-4442-9926-2bcb696520f1', 'status': 'failed',
'description': 'sparsify_volume', 'job_type': 'storage',
'error':
{'code': 100, 'message': 'General Exception: (\'Command
[\\\'/usr/bin/virt-sparsify\\\', \\\'--machine-readable\\\',
\\\'--in-place\\\',
\\\'/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share1/8b292c13-fd8a-4a7c-903c-5724ec742c10/images/a367c179-2ac9-4930-abeb-848229f81c97/515fcf06-8743-45d1-9af8-61a0c48e8c67\\\']
failed with rc=1 out=b\\\'3/12\\\\n{ "message": "libguestfs error:
guestfs_launch failed.\\\\\\\\nThis usually means the libguestfs
appliance failed to start or crashed.\\\\\\\\nDo:\\\\\\\\n export
LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1\\\\\\\\nand run the command
again. For further information, read:\\\\\\\\n
http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs\\\\\\\\nYou
can also run \\\\\\\'libguestfs-test-tool\\\\\\\' and post the
*complete* output\\\\\\\\ninto a bug report or message to the
libguestfs mailing list.", "timestamp":
"2020-10-11T22:05:08.397538670-04:00", "type": "error"
}\\\\n\\\'
err=b"virt-sparsify: error: libguestfs error: guestfs_launch
failed.\\\\nThis usually means the libguestfs appliance failed to
start or crashed.\\\\nDo:\\\\n export LIBGUESTFS_DEBUG=1
LIBGUESTFS_TRACE=1\\\\nand run the command again. For further
information, read:\\\\n
http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs\\\\nYou
can also run \\\'libguestfs-test-tool\\\' and post the *complete*
output\\\\ninto a bug report or message to the libguestfs mailing
list.\\\\n\\\\nIf reporting bugs, run virt-sparsify with debugging
enabled and include the \\\\ncomplete output:\\\\n\\\\n virt-sparsify
-v -x [...]\\\\n"\',)'}}} (__init__:356)
/var/log/messages has:
Oct 11 22:04:51 lago-basic-suite-master-host-0 kvm[80601]: 1 guest now active
Oct 11 22:05:06 lago-basic-suite-master-host-0 journal[80557]: Domain
id=1 name='guestfs-hl0ntvn92rtkk2u0'
uuid=05ea5a53-562f-49f8-a8ca-76b45c5325b4 is tainted: custom-argv
Oct 11 22:05:06 lago-basic-suite-master-host-0 journal[80557]: Domain
id=1 name='guestfs-hl0ntvn92rtkk2u0'
uuid=05ea5a53-562f-49f8-a8ca-76b45c5325b4 is tainted: host-cpu
Oct 11 22:05:06 lago-basic-suite-master-host-0 kvm[80801]: 2 guests now active
Oct 11 22:05:08 lago-basic-suite-master-host-0 journal[80557]:
internal error: End of file from qemu monitor
Oct 11 22:05:08 lago-basic-suite-master-host-0 kvm[80807]: 1 guest now active
Oct 11 22:05:08 lago-basic-suite-master-host-0 journal[80557]: cannot
resolve symlink /tmp/libguestfseTG8xF/console.sock: No such file or
directoryKN<F3>L^?
Oct 11 22:05:08 lago-basic-suite-master-host-0 journal[80557]: cannot
resolve symlink /tmp/libguestfseTG8xF/guestfsd.sock: No such file or
directoryKN<F3>L^?
Oct 11 22:05:08 lago-basic-suite-master-host-0 journal[80557]: cannot
lookup default selinux label for /tmp/libguestfs1WkcF7/overlay1.qcow2
Oct 11 22:05:08 lago-basic-suite-master-host-0 journal[80557]: cannot
lookup default selinux label for
/var/tmp/.guestfs-36/appliance.d/kernel
Oct 11 22:05:08 lago-basic-suite-master-host-0 journal[80557]: cannot
lookup default selinux label for
/var/tmp/.guestfs-36/appliance.d/initrd
Oct 11 22:05:08 lago-basic-suite-master-host-0 vdsm[74096]: ERROR Job
'05eaea44-7e4c-4442-9926-2bcb696520f1' failed#012Traceback (most
recent call last):#012 File
"/usr/lib/python3.6/site-packages/vdsm/jobs.py", line 159, in run#012
self._run()#012 File
"/usr/lib/python3.6/site-packages/vdsm/storage/sdm/api/sparsify_volume.py",
line 57, in _run#012
virtsparsify.sparsify_inplace(self._vol_info.path)#012 File
"/usr/lib/python3.6/site-packages/vdsm/virtsparsify.py", line 40, in
sparsify_inplace#012 commands.run(cmd)#012 File
"/usr/lib/python3.6/site-packages/vdsm/common/commands.py", line 101,
in run#012 raise cmdutils.Error(args, p.returncode, out,
err)#012vdsm.common.cmdutils.Error: Command ['/usr/bin/virt-sparsify',
'--machine-readable', '--in-place',
'/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share1/8b292c13-fd8a-4a7c-903c-5724ec742c10/images/a367c179-2ac9-4930-abeb-848229f81c97/515fcf06-8743-45d1-9af8-61a0c48e8c67']
failed with rc=1 out=b'3/12\n{ "message": "libguestfs error:
guestfs_launch failed.\\nThis usually means the libguestfs appliance
failed to start or crashed.\\nDo:\\n export LIBGUESTFS_DEBUG=1
LIBGUESTFS_TRACE=1\\nand run the command again. For further
information, read:\\n
http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs\\nYou
can also run \'libguestfs-test-tool\' and post the *complete*
output\\ninto a bug report or message to the libguestfs mailing
list.", "timestamp": "2020-10-11T22:05:08.397538670-04:00",
"type":
"error" }\n' err=b"virt-sparsify: error: libguestfs error:
guestfs_launch failed.\nThis usually means the libguestfs appliance
failed to start or crashed.\nDo:\n export LIBGUESTFS_DEBUG=1
LIBGUESTFS_TRACE=1\nand run the command again. For further
information, read:\n
http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs\nYou can
also run 'libguestfs-test-tool' and post the *complete* output\ninto a
bug report or message to the libguestfs mailing list.\n\nIf reporting
bugs, run virt-sparsify with debugging enabled and include the
\ncomplete output:\n\n virt-sparsify -v -x [...]\n"
The next run of the job (480) did finish successfully. No idea if it
was already fixed by a patch, or is simply a random/env issue.
I think this is env issue, we run on overloaded vms with small amount of memory.
I have seen such radnom failures before.
Is it possible to pass LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1 without
patching vdsm? Not sure. In any case, if this does not cause too much
excess debug logging, perhaps better always pass it, to help
retroactively analyze such failures.
Unfortunately using LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1 creates
huge amount of logs so it is not possible to include these log for every run.
Maybe we can show the logs only if the command failed, but vdsm infrastructure
make this hard to do since we always log commnad stderr, even if the command
was successful.
Also I'm not sure this is a good idea to always run virt-sparsify in
debug level.
I think I posted a patch to run libguestfs-test-tool after virt-something errors
for collecting data, but I cannot find it. Maybe this can be useful,
so after every
virt-something error we will have enough info about the issue in vdsm log.
Or, patch
virt-sparsify/libguestfs/whatever to always log at least enough
information on failure even without passing these.
I think this is the right solution - when virt-something tool fails,
it should log the
reason for the failure - the error that caused the tool to fail. I'm
not sure this is
easy to do as the failing code run inside a special VM. Maybe the code running
in the VM should log the output in a machine readable way, so once an error
is detected virt-something can report the error as the reason, without running
in debug mode.
Nir