On Tue, Jan 26, 2021 at 08:51:59AM -0600, Eric Blake wrote:
However, that's for the general case (when multiple writers are
sending
requests that can overlap one another, and when you have to worry about
read consistency). But when copying an image, you have a special case:
each writer is only touching distinct subsets of the overall image, and
you are not reading from the image. As long as those subsets do not
overlap, and you do not trigger read-modify-write actions, you really
don't care whether flushes are consistent between writers, because there
are no consistency issues in the first place.
So we _might_ be able to optimize nbdcopy for the case of parallel
writes even when the NBD server does not advertise CAN_MULTI_CONN; the
writers can either use FLAG_FUA for every write, or _each_ write without
flushes and perform a final NBD_CMD_FLUSH; as long as EACH writer
performs a final flush before disconnecting (rather than trying to
optimize to just one writer flushing), you will be guaranteed that the
entire copied image is now consistent, even though you could not
guarantee flush consistency during the parallel writes.
It happens that nbdcopy does call nbd_flush on each handle. (A case
of being "right but for the wrong reasons"). However we don't do
multi-conn unless the server advertises it, and in light of what you
say above perhaps we should.
>
https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md
>
> "bit 8, NBD_FLAG_CAN_MULTI_CONN: Indicates that the server operates
> entirely without cache, or that the cache it uses is shared among
> all connections to the given device. In particular, if this flag is
> present, then the effects of NBD_CMD_FLUSH and NBD_CMD_FLAG_FUA MUST
> be visible across all connections when the server sends its reply to
> that command to the client. In the absence of this flag, clients
> SHOULD NOT multiplex their commands over more than one connection to
> the export."
Although the text only mentions flush & FUA, I wonder if there's
another case where multi-conn should not be advertised or used: That
is where the block or sector size is larger than the size of writes,
so writes are being turned into r/m/w cycles. Multiple adjacent (even
non-overlapping) writes could then interfere with each other.
This is vanishingly unlikely to affect nbdcopy which uses a huge block
size, but would it be a problem in the NBD protocol itself?
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW