This patch series attempts to fix the well-known problem in nbdcopy
that it ignores the preferred block size. It only attempts to fix it
for writing (which is the case we care about for virt-v2v).
Unfortunately it's subtly incorrect, and in a way that is difficult
for me to see how to fix right now. Hence posting it so I can return
to it later.
[None of the problem description below will make any sense until you
look at the patches.]
The problem with the second patch is that the assembled blocks are
written synchronously on handle[0]. However the handle can be in use
at the same time by multi-thread-copying.c, resulting in two poll(2)
loops running at the same time trying to consume events.
It's very hard to reproduce this problem -- all the tests run fine --
but the following command sometimes demonstrates it:
$ nbdkit -U - --filter=checkwrite --filter=offset data '1 @0x100000000 1' offset=1
--run './run nbdcopy "$uri" "$uri"'
It will either run normally, deadlock, or give a weird error from the
state machine which is characteristic of the two poll loops competing
for events:
nbd+unix://?socket=/tmp/nbdkitVOF5BR/socket: nbd_shutdown: nothing to
poll for in state REPLY.START: Invalid argument
One way to fix this would be to open an extra handle to the
destination NBD server for sending these completed blocks. However
that breaks various assumptions, and wouldn't work for !multi-conn
servers.
Another way would be some kind of lock around handle[0], but that
seems hard to do given that we're doing asynch operations.
Rich.