Hi Rich,

I spend several days on this issue but still cannot figure out the root cause. Sometimes, guestfs_lauch() gets stuck, sometimes, it is ok.
I reproduced this issue using guestfish:
-bash-4.2# guestfish 

Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: 'help' for help on commands
      'man' to read the manual
      'quit' to quit the shell

><fs> add /dev/xvdc
><fs> run
| 75% [#####################################################################################################################################################################--------------------------------------------------------] 00:15

It has been stuck for about 10 minutes. I can see if is in guestfs_launch():
-bash-4.2# gdb -p `pidof guestfish` -ex "set confirm off" -ex "bt" -ex "q"
0x00007fe6f6b5a680 in __poll_nocancel () at ../sysdeps/unix/syscall-template.S:81
81 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
#0  0x00007fe6f6b5a680 in __poll_nocancel () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007fe6f795eea4 in poll (__timeout=-1, __nfds=2, __fds=0x7ffe4c0537b0) at /usr/include/bits/poll2.h:41
#2  read_data (g=0x7fe6f8e04bd0, connv=0x7fe6f8df9100, bufv=<optimized out>, len=4) at conn-socket.c:162
#3  0x00007fe6f7983f0c in recv_from_daemon (buf_rtn=0x7ffe4c053908, buf_rtn@entry=0x7ffe4c053658, size_rtn=0x7ffe4c0538e4, size_rtn@entry=0x7ffe4c053634, g=0x7fe6f8e04bd0, g@entry=0x7ffe4c053840) at proto.c:515
#4  guestfs_int_recv_from_daemon (g=g@entry=0x7fe6f8e04bd0, size_rtn=size_rtn@entry=0x7ffe4c0538e4, buf_rtn=buf_rtn@entry=0x7ffe4c053908) at proto.c:611
#5  0x00007fe6f797dae0 in launch_libvirt (g=0x7fe6f8e04bd0, datav=0x7fe6f8e04d90, libvirt_uri=<optimized out>) at launch-libvirt.c:573
#6  0x00007fe6f7973b2b in guestfs_impl_launch (g=g@entry=0x7fe6f8e04bd0) at launch.c:93
#7  0x00007fe6f790fb4d in guestfs_launch (g=0x7fe6f8e04bd0) at actions-3.c:142
#8  0x00007fe6f86e7eda in run_launch (cmd=<optimized out>, argc=<optimized out>, argv=<optimized out>) at cmds.c:13411
#9  0x00007fe6f870276c in issue_command (cmd=0x7fe6f8df5170 "run", argv=argv@entry=0x7ffe4c053f98, pipecmd=0x0, rc_exit_on_error_flag=<optimized out>) at fish.c:1181
#10 0x00007fe6f870334e in script (prompt=prompt@entry=1) at fish.c:733
#11 0x00007fe6f86c7262 in interactive () at fish.c:630
#12 main (argc=1, argv=0x7ffe4c0543f8) at fish.c:577



I tried export LIBGUESTFS_BACKEND=direct, and it is also stuck for more than 10 minutes, here is the call stack:
0x00007f2dd9ac6ee0 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:81
81 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
#0  0x00007f2dd9ac6ee0 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f2dd9ac6d94 in __sleep (seconds=0, seconds@entry=2) at ../sysdeps/unix/sysv/linux/sleep.c:137
#2  0x00007f2dda912089 in launch_direct (g=0x7f2ddc527bd0, datav=<optimized out>, arg=<optimized out>) at launch-direct.c:814
#3  0x00007f2dda90eb2b in guestfs_impl_launch (g=g@entry=0x7f2ddc527bd0) at launch.c:93
#4  0x00007f2dda8aab4d in guestfs_launch (g=0x7f2ddc527bd0) at actions-3.c:142
#5  0x00007f2ddb682eda in run_launch (cmd=<optimized out>, argc=<optimized out>, argv=<optimized out>) at cmds.c:13411
#6  0x00007f2ddb69d76c in issue_command (cmd=0x7f2ddc518170 "run", argv=argv@entry=0x7ffd9f69fe28, pipecmd=0x0, rc_exit_on_error_flag=<optimized out>) at fish.c:1181
#7  0x00007f2ddb69e34e in script (prompt=prompt@entry=1) at fish.c:733
#8  0x00007f2ddb662262 in interactive () at fish.c:630
#9  main (argc=1, argv=0x7ffd9f6a0288) at fish.c:577


Has anyone seen this?

Thanks. 
Allen


2016-08-30 7:11 GMT+08:00 Baochuan Wu <wildpointercs@gmail.com>:
Thanks Rich.
I have used libguestfs for several month. It worked perfectly before, the issue appears recently. I am trying guestfs_set_backend (g, "direct").

Thanks,
Allen

2016-08-29 23:31 GMT+08:00 Richard W.M. Jones <rjones@redhat.com>:
On Mon, Aug 29, 2016 at 11:19:04PM +0800, Baochuan Wu wrote:
> Thanks Rich for you quick reply. I enabled logs and the program stuck
> again, here is the call stack and log:
> Thread 1 (Thread 0x7fac58edc8c0 (LWP 1271)):
> #0  0x00007fac578fac20 in __poll_nocancel () from /lib64/libc.so.6
> #1  0x00007fac56df3c5a in virNetClientIOEventLoop () from
> /lib64/libvirt.so.0
> #2  0x00007fac56df441b in virNetClientSendInternal () from
> /lib64/libvirt.so.0
> #3  0x00007fac56df5843 in virNetClientSendWithReply () from
> /lib64/libvirt.so.0
> #4  0x00007fac56df6052 in virNetClientProgramCall () from
> /lib64/libvirt.so.0
> #5  0x00007fac56dcbfe2 in callFull.isra.2 () from /lib64/libvirt.so.0
> #6  0x00007fac56de213d in remoteDomainCreateXML () from /lib64/libvirt.so.0
> #7  0x00007fac56d82151 in virDomainCreateXML () from /lib64/libvirt.so.0
> #8  0x00007fac58acca50 in launch_libvirt () from /lib64/libguestfs.so.0
> #9  0x00007fac58ac2b2b in guestfs_impl_launch () from /lib64/libguestfs.so.0
> #10 0x00007fac58a5eba5 in guestfs_launch () from /lib64/libguestfs.so.0
> #11 0x00000000004117ca in main ()
...
> libguestfs: [62900ms] launch libvirt guest

The error is happening in libvirt's virDomainCreateXML call, called
from libguestfs here:

https://github.com/libguestfs/libguestfs/blob/master/src/launch-libvirt.c#L600

Unfortunately libvirt isn't a simple C library.  It will launch and
talk to a daemon (usually ``libvirt --timeout=30'' process, if you are
not running as root).  Debugging into libvirtd is described here:

http://libguestfs.org/guestfs-faq.1.html#debugging-libvirt

A workaround is to set LIBGUESTFS_BACKEND=direct [or use the
equivalent call ``guestfs_set_backend (g, "direct")''] which will
cause libguestfs to run qemu directly instead of going through
libvirt.

http://libguestfs.org/guestfs.3.html#backend

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/