On Wednesday, 3 January 2018 08:06:38 CET Nikolay Ivanets wrote:
>From IRC channel:
<StenaviN> Can someone confirm cat/test-virt-tail.sh works in 'master'?
<StenaviN> I get
https://pastebin.com/GBkg7Vtw
<rwmjones> StenaviN: yes it works for me; the error is not very helpful,
you'll need to set LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1
<StenaviN>
https://pastebin.com/yABgCHwV
<rwmjones> I think the error is:
<rwmjones> libguestfs: trace: statns "/tail"
<rwmjones> guestfsd: => mount_options (0x4a) took 0.00 secs
<rwmjones> guestfsd: <= statns (0x1a5) request length 52 bytes
<rwmjones> [ 0.930738] EXT2-fs (sda1): error: ext2_lookup: deleted inode
referenced: 12
<rwmjones> guestfsd: error: /tail: Input/output error
<rwmjones> but I don't know exactly why
<StenaviN> Yes, I see. Trying to figure out...
<rwmjones> actually no, that's not the problem
<rwmjones> for some reason two instances of ‘guestfish --remote exit’ run
at the same time, but according to the test script only one should run
<rwmjones> notice how the cleanup() function is called twice
<rwmjones> afaik that should never happen
<StenaviN> and I saw two qemu/guestfish processes running. Continue
investigating...
And here is what I found:
Second copy of 'guestfish --listen' process is a child "recovery
process" (
https://github.com/libguestfs/libguestfs/blob/master/lib/launch-direct.c#...)
and that is OK.
'cleanup' was called twice because:
1. call to virt-tail returns non-zero exit code (due to Input/Output error.
About this later.) and we trap ERR signal which cause to run 'cleanup' once
2. in 'cleanup' we do 'exit $statuscode' and we trap EXIT and
'cleanup' is
called once again
It might look confusing but not end of the life. At least there is an
explanation if I didn't miss something.
Now about failing test case with virt-tail.
Jumping ahead, adding extra 'guestfish --remote sync' after 'guestfish
--remote rm /tail' in 'cat/test-virt-tail.sh' fixes the test case.
virt-tail re-creates overlay image each time it trying to access the file
and calls guestfs_statns for the file(s) it watching.
guestfs_statns in turn returns NULL indicating an error with exit code EIO
instead of ENOENT: 'EXT2-fs (sda1): error: ext2_lookup: deleted inode
referenced: 12'. (see pastebin posted in discussion above).
So I suspect that changes on original disk made through 'guestfish --remote
rm /tail' call were not fully flushed which confirms by proposed patch.
It is hard to explain why it works on your system but it might be because
of number of factors:
1. Different QEMU caching policy
2. Different caching policy of underlying OS/filesytem
3. etc.
I agree with your investigation (which is good, thanks for doing it).
Indeed, other changes done via `guestfish --remote` in that tests are
flushed via a 'sync' command, so it makes sense for the 'rm /tail' one
to be flushed in a similar way.
Can you please write a bit more details in the commit message itself,
so it is easier to check/inspect later on why the change was done?
Thanks,
--
Pino Toscano