On Wed, May 22, 2019 at 09:21:30PM -0500, Eric Blake wrote:
On 5/22/19 4:50 AM, Richard W.M. Jones wrote:
> If not using multi-conn then obviously the synchronous connection
> calls ‘nbd_connect_unix’, ‘nbd_connect_tcp’ and ‘nbd_connect_command’
> should only return when the (one) connection object is connected.
>
> In the multi-conn case it's not very clear what these synchronous
> calls should do. Previously I had it so that they would return as
> soon as at least one connection was connected. However this is a
> problem if you are using them as a convenient way to set up a
> multi-threaded main loop, because it can be that some of them have not
> finished connecting, but then you issue commands on those connections
> and that will fail. The failure is pernicious because most of the
> time you won't see it, only if one connection is slow. So it's
> (probably) better that the synchronous ‘nbd_connect_unix’ and
> ‘nbd_connect_tcp’ should connect every connection object before
> returning.
>
> For ‘nbd_connect_command’, it essentially ignored multi-conn anyway,
> and I changed it so that it waits for conn[0] to get connected and
> returns, the other connections (if they exist) being ignored. It
> should probably be an error for the user to enable multi-conn on the
> handle and then use nbd_connect_command, but I did not make that
> change yet.
Ouch - I just realized that we created a different problem. If the
server does not add multi-conn because it does NOT permit parallel
connections (such as 'nbdkit --filter=noparallel memory 1m
serialize=connections'), then a client requesting
nbd_set_multi_conn(nbd, 2) will now hang because we will never be able
to move all our connections through handshake. While we can still try to
fire off multiple connections at once, we need to add some logic in
nbd_connect_unix() and nbd_connect_tcp() to check can_multi_conn after
the FIRST connection completes (whichever one wins), and if
!can_multi_conn(), it would be nice if we could automatically kill off
the losers, swap the winning connection into conns[0], and then behave
as if we implicitly called set_multi_conn(nbd, 1), so that the
synchronous connect command will succeed after all. In such a case, the
user can learn after the fact via nbd_can_multi_conn and
nbd_get_multi_conn that things were changed because multi_conn was not
supported after all.
It's somewhat unavoidable given the way that the NBD protocol itself
works. While it's a pain, below is the logic that I came up with in
the forthcoming nbdtool. nr_multi_conn is a command line parameter to
select multi_conn, with the desired behaviour being to fall back to
single threaded if multi-conn is not supported. Also this code will
become shorter and simpler when we get URL support:
static void
connect_to (struct nbd_handle *nbd, char *arg)
{
char *p = strchr (arg, ':');
int r;
if (p) {
*p = '\0';
r = nbd_connect_tcp (nbd, arg, p+1);
}
else
r = nbd_connect_unix (nbd, arg);
if (r == -1) NBD_ERROR_EXIT;
/* Check if the connected handle supports multi-conn, and enable or
* disable it.
*/
if (nr_multi_conn == 1)
return;
if (nbd_can_multi_conn (nbd) == 1) {
if (nbd_set_multi_conn (nbd, nr_multi_conn) == -1)
NBD_ERROR_EXIT;
/* Now make the other connections on the handle. */
if (p)
r = nbd_connect_tcp (nbd, arg, p+1);
else
r = nbd_connect_unix (nbd, arg);
if (r == -1) NBD_ERROR_EXIT;
}
else
nr_multi_conn = 1;
}
(In fact it was this code which provokes the error which required
this fix:
https://www.redhat.com/archives/libguestfs/2019-May/msg00134.html )
However that still leaves this problem:
Or maybe we want both 'set_multi_conn' (connect hard-fails if
the FIRST
connection reports !can_multi_conn) vs. 'request_multi_conn' (connect
downgrades gracefully), similar to how request_meta_context downgrades
gracefully if the server lacks SR or a particular meta context.
IOW:
* What if the user sets multi-conn > 1, does NOT check can_multi_conn
after connecting, and the server allows multiple connections but
does not give the true multi-conn guarantees?
That results in a data corruption issue. So I think that connections
should fail if we get to the point where we have read the eflags,
multi-conn is not set, but h->multi_conn > 1. I've put this on my
to-do list.
Requesting multi-conn optionally is harder: We could adjust
h->multi_conn == 1, swap the winning connection object into
h->conns[0] and close the others, but I think that's very tricksy.
On IRC we were chatting about how multi-conn > 1 makes no sense
with
connect_command() (where we are limited to exactly one channel over
stdin/stdout), so it may also be worth adding some logic to have
connect_command fail if set_multi_conn is > 1,
Also added to my to-do list.
and to likewise have
set_multi_conn(2) fail if the existing connection is already started in
command mode (do we even track which mode a commands started in, and/or
insist that all connections used the same nbd_connect_*? Or can you
mix-and-match a TCP and a Unix socket into a multi-conn struct nbd_handle?)
We don't track any of this, and yes it's possible.
On the other hand, a user that is aware of multi-conn being optional
may
manually try to make one connection, check can_multi_conn, and THEN call
set_multi_conn(2), with some expectation of being able to then connect
the remaining slots (again, if we stored the nbd_connect_* used by the
FIRST connection, then all the remaining connections requested could
automatically be fired off to start connecting to the same server - and
you'd need a synchronous way to wait for them all to reach stable
state). Perhaps that's even a reason to have two separate API - one
that is callable only while you are in created state (a request of what
to try if the server supports it), and the other callable only after
your single connection has completed handshake (to then expand to
actually get the additional connections now that it is known to be
safe). At any rate, I don't think we're done with the design in this area.
That's what the code does above.
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
virt-top is 'top' for virtual machines. Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top