On Tue, Nov 19, 2019 at 02:14:32PM +0000, Richard W.M. Jones wrote:
On Tue, Nov 19, 2019 at 02:36:36PM +0100, Martin Kletzander wrote:
> nbdkit: python[1]: error: /var/tmp/rhvupload.jngN1W/rhv-upload-plugin.py: close:
error: ['Traceback (most recent call last):\n', ' File
"/var/tmp/rhvupload.jngN1W/rhv-upload-plugin.py", line 362, in close\n',
"FileNotFoundError: [Errno 2] No such file or directory:
'/var/tmp/rhvupload.jngN1W/diskid.0'\n"]
> nbdkit: debug: python: unload plugin
>
> So it might be because virt-v2v already removed that directory and
> did not wait for nbdkit to completely end. I'm testing with older
> commit of virt-v2v now.
This is very likely.
Shutdown on error is complicated. Virt-v2v starts one or more nbdkit
processes in the background and then simply runs “qemu-img convert”.
If nbdkit notices an error then it returns an error over NBD to
qemu-img. If qemu-img exits with an error then virt-v2v exits.
Before virt-v2v exits, it runs any exit handlers. In particular if
you're using OCaml's at_exit, C's atexit(3) or wrappers like
Tools_utils.unlink_on_exit or Tools_utils.rmdir_on_exit, then those
have already run before nbdkit starts to shut down.
Nbdkit should receive a signal from the kernel when its parent process
(virt-v2v) goes away, because we're using prctl + PR_SET_PDEATHSIG +
SIGTERM (via ‘nbdkit --exit-with-parent’). Note this happens *after*
virt-v2v has fully exited.
Hence what I say about the above being likely, since
rmdir_on_exit "/var/tmp/rhvupload.XXXXXX" is being called from
virt-v2v on exit:
https://github.com/libguestfs/virt-v2v/blob/b8b9dcc90dbd91aec4b6bb82dd511...
To further complicate things, in nbdkit < 1.16 the shutdown path from
a signal was pretty racy. nbdkit 1.16 attempts to fix the shutdown
path so that we now properly wait for all threads to exit before
exiting nbdkit. The upstream commit is:
https://github.com/libguestfs/nbdkit/commit/07806d6d5511bb5da2dfae2bf0009...
nbdkit 1.16 is available in Fedora 31+ and RHEL 8.2 AV (out of brew at
the moment), and while it probably won't make any difference here, if
possible you should upgrade to it. It's fully backwards compatible.
Oh and finally if we're running in a systemd unit, then systemd might
try to kill everything when virt-v2v exits (but before nbdkit exits)
and it's anyone's guess what happens then. Good luck! Probably best
to try to make the code as bulletproof as possible so it doesn't
depend on clean ups always running correctly.
I am running nbdkit from current master there, so that should be fine. But
since it is ran by virt-v2v-wrapper on a fedora VM inside oVirt, it is running
under systemd unit.
I should say this is not the main issue, it's just something that happens on a
clean-up path after another error has happened.
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW