On Tue, Aug 03, 2021 at 10:36:47PM +0300, Nir Soffer wrote:
On Tue, Aug 3, 2021 at 10:26 PM Eric Blake <eblake(a)redhat.com>
wrote:
>
> On Mon, Aug 02, 2021 at 08:41:20AM +0100, Richard W.M. Jones wrote:
> > ---
> > v2v/rhv-upload-plugin.py | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/v2v/rhv-upload-plugin.py b/v2v/rhv-upload-plugin.py
> > index a3d578176..51e9b33f7 100644
> > --- a/v2v/rhv-upload-plugin.py
> > +++ b/v2v/rhv-upload-plugin.py
> > @@ -169,6 +169,10 @@ def open(readonly):
> > }
> >
> >
> > +def can_multi_conn(h):
> > + return True
>
> Should this be h['can_flush'] instead of True?
>
> Are we guaranteed that imageio guarantees a consistent image across
> all other http connections when a flush is received on a single http
> connection?
imageio keeps multiple connections to qemu-nbd, and pass all requests
to qemu-nbd. No caching is done in imageio.
But we don't do any synchronization, so we may have one flush command
in the middle of a write/write_zeros commands.
Does qemu-nbd wait until the write/wirte_zeros commands complete
before flushing?
The rules on the client is that if you want action A to be visible by
B, you have to wait until the server's response to action A has been
received, then send the flush, then wait for the server's response to
the flush, then send action B. If the server supports multi-conn,
then those rules apply regardless of which connection sends action A,
the flush, or action B; if multi-conn is not relied on, then all of A,
the flush, and B must be on the same connection.
qemu-nbd is built on top of the qemu block layer which has its own
internal serialization that guarantees that any flush action acted on
by the block layer will stabilize any earlier writes and finish before
any further writes - but that does NOT take into account the
possibility that parallel clients can race to have out-of-order
transmission effects. Even with a single client connection, where the
client batch-sends A/flush/B over the wire, the qemu-nbd coroutine
code is such that based on the order in which the OS wakes up
semaphores in response to data being available to read from the
socket, the calls into the block layer could still be rearranged as
B/flush/A. That is, if an NBD client issues action A but does not
wait for A's response before issuing a flush, or the client issues
action B before getting the response to the flush, you cannot ensure
that qemu-nbd processes those commands in the same order. So in
general, the client has to actually wait for the response to the write
before sending the flush, if the write is supposed to be picked up as
part of the flush.
But that's only when you care about the effects of action A definitely
being visible to action B. There is nothing stopping a client from
issuing a flush in parallel with in-flight write/zero commands; BUT
that client must also be prepared that the writes that are in flight
might not be flushed by the flush command. In the case of nbdcopy, we
know that our writes are non-overlapping and at a large enough
granularity to not trigger read-modify-write sharding, so we really
don't care whether action A is visible before starting action B; all
we REALLY care about is that after ALL write actions are done, we
flush the disk at least once before disconnecting, so that there is no
data loss due to disconnecting too early. In this style of operation,
you aren't going to issue the flush until after all other commands
have completed, so there are no in-flight commands in parallel with
the flush to worry about in the first place.
Having a flush in parallel with other in-flight operations tends to
happen more with actual guest use of a disk, when the guest OS really
does want to ensure that action A is visible to action B (say two
cooperating processes manipulating a common file in the file system),
but does not care about action C being performed in parallel by an
independent process (managing a different file in the file system).
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization:
qemu.org |
libvirt.org