On 03/31/22 16:44, Richard W.M. Jones wrote:
On Thu, Mar 31, 2022 at 03:38:40PM +0200, Laszlo Ersek wrote:
> Thanks! BTW: do you have any comments about the following serial log
> message (see the cover letter):
[
https://listman.redhat.com/archives/libguestfs/2022-March/028498.html]
>> nbdkit: file[1]: error: reading initial client flags: conn->recv:
>> Connection reset by peer
What's happening is nbdkit is sending the initial handshake and
expecting client flags back, but received a RST instead. I couldn't
quite work out exactly why, but it seems to be a side-effect of the
way we test that the NBD server is up:
https://github.com/libguestfs/virt-p2v/blob/c1a4bd83a8bba5959470ef1f6deb0...
I believe after your simplifications that code is not necessary at
all, since with systemd socket activation the sockets passed to the
server are always listening (even if the server isn't ready to accept
a connection at that moment).
Oh I think understand it now. In wait_for_nbd_server_to_start(), we
connect to our nbdkit server child process like any other nbdkit client
would, and even read "NBDMAGIC" (common to both old and new style
negotiation). The server (the child process) most likely places the next
protocol words in its Send-Q as well (IHAVEOPT and handshake flags), and
then attempts to read back the client flags.
But the "client", wait_for_nbd_server_to_start(), doesn't even attempt
reading past byte#7 from its Recv-Q:
char magic[8]; /* NBDMAGIC */
size_t bytes_read = 0;
ssize_t recvd;
...
do {
recvd = recv (sockfd, magic, sizeof magic - bytes_read, 0);
...
}
...
close (sockfd);
so when we reach the close(), one of two cases can happen:
- the client's Recv-Q has some bytes from the server's IHAVEOPT, and the
handshake flags, and throws them away, or
- the client's Recv-Q is now empty (it didn't have any bytes beyond
NBDMAGIC), but the server's Send-Q is not empty.
I think that, in either case, the close() will result in an RST for
whatever the server tries to do next with the socket. (Much more likely
in the second case, i.e. when the server's Send-Q was not empty at the
time of the client's close(), but I think the same happens in the first
case too, when the client simply throws away data from its Recv-Q.)
Now the question is if we are bothered by this "RST" in the conversion
log, as it's expected.
I think wait_for_nbd_server_to_start() does add a little bit of safety.
Namely, if the child goes away unexpectedly (because the execlp() fails,
or the new image exits for whatever reason very soon), we find out about
that (with a timeout in the worst case), and don't start up "ssh -R".
With wait_for_nbd_server_to_start() removed, this "child failure" will
only be visible by virt-v2v attempting to connect back via "ssh -R", and
then (with ssh on the p2v box getting an RST), witnessing a similarly
abrubpt EOF / RST from sshd on the conversion server. Ultimately
virt-v2v will fail, and this will be visible on virt-p2v, but that's a
detour.
I agree that functionally, wait_for_nbd_server_to_start() is not
necessary, as long as the child does not vanish. With the child present,
there is always at least one file descriptor pointing to the file
description pointing to the listening socket, so the p2v side of "ssh
-R" will always have a socket to connect to.
I'll append a patch anyway to the series, just for discussion's sake.
Does it work OK if you simply remove wait_for_nbd_server_to_start
and
the place where that function is called?
Yes, the message disappears then.
Thanks
Laszlo
(CC-d Eric)
Rich.