I have an update on the networking issue:
- After the deep dive into the logs of the firewall by customer's security team, it turns out that even though there were some disconnections, the time-stamps do not match.
 This means that we got the disconnected by something else (ESXi or conversion host perhaps)
- As we mentioned in the chat briefly, there could be general keep-alive issues on both RHEL (conversion host) and ESXi side.
 We changed the keep-alive settings in RHEL, but could not find the equvalent in VMware as of yet.
- I found on a few spots that there are some vddk (vixDiskLib.nfc*) settings which can configure NFC keep-alives and timeouts, but I do not understand it deeply enough to see if anything would help.

Whatever may be the cause, a retry filter would most likely solve the problem.

Since we are fairly certain that we would encounter another failure with VDDK how the situation stands now, we are trying SSH transport to see how that will go.

Cheers,

Nenad Perić

PRINCIPAL SOFTWARE ENGINEER

Red Hat - Migration Engineering

nenad@redhat.com



On Thu, Sep 19, 2019 at 11:50 AM Richard W.M. Jones <rjones@redhat.com> wrote:
On Wed, Sep 18, 2019 at 01:59:01PM +0100, Richard W.M. Jones wrote:
> We have a running problem with the nbdkit VDDK plugin where the VDDK
> side apparently disconnects or the network connection is interrupted.
> During a virt-v2v conversion this causes the entire operation to fail,
> and since v2v conversions take many hours that's not a happy outcome.
>
> (Aside: I should say that we see many cases where it's claimed that
> the connection was dropped, but often when we examine them in detail
> the cause is something else.  But it seems like this disconnection
> thing does happen sometimes.)

It turns out in the customer case that led us to talk about this, a
Checkpoint firewall was forcing the VDDK control connection to be
closed after an idle period.  (The VDDK connection as a whole was not
actually idle because data was being copied over the separate data
port, but the firewall did not associate the two ports).  I believe
nbdkit-retry-filter would have helped in this case because reopening
the VDDK connection will reestablish the control/metadata connection,
and therefore I am looking at an implementation now.

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top