On Sat, May 31, 2014 at 01:25:04AM +0800, Qin Zhao wrote:
Hi all,
When I run Icehouse code, I encountered a strange problem. The nova-compute
service becomes stuck, when I boot instances. I report this bug in
https://bugs.launchpad.net/nova/+bug/1313477.
After thinking several days, I feel I know its root cause. This bug should
be a deadlock problem cause by pipe fd leaking. I draw a diagram to
illustrate this problem.
https://docs.google.com/drawings/d/1pItX9urLd6fmjws3BVovXQvRg_qMdTHS-0JhY...
However, I have not find a very good solution to prevent this deadlock.
This problem is related with Python runtime, libguestfs, and eventlet. The
situation is a little complicated. Is there any expert who can help me to
look for a solution? I will appreciate for your help!
Thanks for the useful diagram. libguestfs itself is very careful to
open all file descriptors with O_CLOEXEC (atomically if the OS
supports that), so I'm fairly confident that the bug is in Python 2,
not in libguestfs.
Another thing to say is that g.shutdown() sends a kill 9 signal to the
subprocess. Furthermore you can obtain the qemu PID (g.get_pid()) and
send any signal you want to the process.
I wonder if a simpler way to fix this wouldn't be something like
adding a tiny C extension to the Python code to use pipe2 to open the
Python pipe with O_CLOEXEC atomically? Are we allowed Python
extensions in OpenStack?
BTW do feel free to CC libguestfs(a)redhat.com on any libguestfs
problems you have. You don't need to subscribe to the list.
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines. Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v