On Mon, Feb 22, 2021 at 03:19:18PM -0600, Eric Blake wrote:
On 11/24/20 7:04 AM, Richard W.M. Jones wrote:
> This implements these interconnected options:
>
> --allocated
> --destination-is-zero (alias: --target-is-zero)
> --no-extents
> ---
> TODO | 6 +-
Late review inspired by Nir's recent work, oh well.
> +++ b/copy/copy-sparse-allocated.sh
> @@ -0,0 +1,92 @@
> +
> +requires nbdkit --version
> +requires nbdkit --exit-with-parent --version
Do we really need the first line, since the second covers the same?
> +++ b/copy/copy-sparse-no-extents.sh
> +requires nbdkit --version
> +requires nbdkit --exit-with-parent --version
Multiple instances.
It's not a big deal either way, but the thinking behind this is:
either nbdkit might not be installed (which would be caught by the
first line) or nbdkit might not have the --exit-with-parent option
because it's running on an OS which doesn't support that like Windows.
It doesn't really matter for practical purposes either way.
> +++ b/copy/copy-sparse.sh
> @@ -0,0 +1,97 @@
> +# Check the output matches expected.
> +if [ "$(cat $out)" != "pwrite 1 4294967296
> +pwrite 32768 0
> +pwrite 32768 1073709056
> +pwrite 32768 4294934528
> +trim 134184960 32768
Sorry I didn't notice this earlier, but Nir is on the correct track, and
updates this test to expect calls to zero instead (as we cannot
guarantee that read-after-trim reads zero, at least not without a
further NBD spec extension).
Yup, Nir has now fixed this, thanks :-)
> +++ b/copy/file-ops.c
> @@ -24,7 +24,16 @@
> +
> +/* This is done synchronously, but that's fine because commands from
> + * the previous work range in flight continue to run, it's difficult
> + * to (sanely) start new work until we have the full list of extents,
> + * and in almost every case the remote NBD server can answer our
> + * request for extents in a single round trip.
> + */
> +static void
> +nbd_get_extents (struct rw *rw, uintptr_t index,
> + uint64_t offset, uint64_t count,
> + extent_list *ret)
> +{
> + extent_list exts = empty_vector;
> +
> + /* Copy the extents returned into the final list (ret). This is
> + * complicated because the extents returned by the server may
> + * begin earlier and begin or end later than the requested size.
> + */
Can the extents returned by the server actually begin earlier? They can
definitely end earlier or later than the reequested size, but it seems
like the beginning is always fixed. (nbdkit allows plugins to return
early data, but then munges that return to pass the client something
that starts from the client's requested offset)
Yes they cannot begin earlier, because the NBD_REPLY_TYPE_BLOCK_STATUS
message descriptors only contain a length field. We could therefore
simplify this.
> + for (i = 0; i < exts.size; ++i) {
> + uint64_t d;
> +
> + if (exts.ptr[i].offset + exts.ptr[i].length <= offset)
> + continue;
which makes me wonder if this 'if' was dead code.
I think so, yes.
> + if (exts.ptr[i].offset < offset) {
> + d = offset - exts.ptr[i].offset;
> + exts.ptr[i].offset += d;
> + exts.ptr[i].length -= d;
> + assert (exts.ptr[i].offset == offset);
> + }
> + if (exts.ptr[i].offset + exts.ptr[i].length > offset + count) {
> + d = offset + count - exts.ptr[i].offset - exts.ptr[i].length;
> + exts.ptr[i].length -= d;
> + assert (exts.ptr[i].length == offset + count);
> + }
> + if (extent_list_append (ret, exts.ptr[i]) == -1) {
> + perror ("realloc");
> + exit (EXIT_FAILURE);
> + }
> +
> + offset += exts.ptr[i].length;
> + count -= exts.ptr[i].length;
> + }
> + }
> +
> + free (exts.ptr);
> +}
> +
> +++ b/copy/nbdcopy.pod
> @@ -4,7 +4,9 @@ nbdcopy - copy to and from an NBD server
>
> =head1 SYNOPSIS
>
> - nbdcopy [-C N|--connections=N] [--flush] [-p|--progress]
> + nbdcopy [--allocated] [-C N|--connections=N]
> + [--destination-is-zero|--target-is-zero]
> + [--flush] [--no-extents] [-p|--progress]
> [-R N|--requests=N] [--synchronous]
> [-T N|--threads=N]
> SOURCE DESTINATION
> @@ -74,6 +76,15 @@ formats use C<qemu-img convert>, see L<qemu-img(1)>.
>
> Display brief command line help and exit.
>
> +=item B<--allocated>
> +
> +Normally nbdcopy tries to create a sparse output (with holes), if the
> +destination supports that. It does this in two ways: either using
> +extent informtation from the source to copy holes (see
information
Will fix.
> +I<--no-extents>), or by detecting runs of zeroes (see
I<-S>). If you
> +use I<--allocated> then nbdcopy creates a fully allocated, non-sparse
> +output on the destination.
> +
> =item B<-C> N
>
> =item B<--connections=>N
> @@ -82,11 +93,30 @@ Set the maximum number of NBD connections
("multi-conn"). By default
> nbdcopy will try to use multi-conn with up to 4 connections if the NBD
> server supports it.
>
> +=item B<--destination-is-zero>
> +
> +=item B<--target-is-zero>
> +
> +Assume the destination is already zeroed. This allows nbdcopy to skip
> +copying blocks of zeroes from the source to the destination. This is
> +not safe unless the destination device is already zeroed.
> +(I<--target-is-zero> is provided for compatibility with
> +L<qemu-img(1)>.)
> +
> =item B<--flush>
>
> Flush writes to ensure that everything is written to persistent
> storage before nbdcopy exits.
>
> +=item B<--no-extents>
> +
> +Normally nbdcopy uses extent metadata to skip over parts of the source
> +disk which contain holes. If you use this flag, nbdcopy ignores
> +extent information and reads everything, which is usually slower. You
> +might use this flag in two situations: the source NBD server has
> +incorrect metadata information; or the source has very slow extent
> +querying so it's faster to simply read all of the data.
> +
I'd still love to teach nbdkit how to send NBD_CMD_READ replies with
zero chunks (right now, only qemu-nbd does that); even with
--no-extents, proper handling of zero chunks can still let us proceed
faster than blindly passing full runs of zeros over the network.
Structured replies?
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/