On Tue, Apr 18, 2023 at 02:33:41PM +0200, Wouter Verhelst wrote:
On Thu, Apr 13, 2023 at 05:02:41PM -0500, Eric Blake wrote:
> Rather than requiring all servers and clients to have a 32-bit limit
> on maximum NBD_CMD_READ/WRITE sizes, we can choose to standardize
> support for a 64-bit single I/O transaction now.
> NBD_REPLY_TYPE_OFFSET_DATA can already handle a large reply, but
> NBD_REPLY_TYPE_OFFSET_HOLE needs a 64-bit counterpart.
>
> By standardizing this, all clients must be prepared to support both
> types of hole type replies, even though most server implementations of
> extended replies are likely to only send one hole type.
I think it's probably a better idea to push this patch to a separate
"extension-*" branch, and link to that from proto.md on master. Those
are documented as "we standardized this, but no first implementor exists
yet".
If someone actually comes up with a reason for 64-bit transactions, we
can then see if the spec matches the need and merge it to master.
Otherwise this feels too much like a solution in search of a problem to
me.
I'd like to push back a bit on this. Firstly Eric does have two
complete implementations. It's true however that they not upstream in
either case.
But we also need this because there are relatively serious issues
observed, particularly around trimming/zeroing, and extents. The
trimming problem can be demonstrated very easily in fact:
$ nbdkit -U - --filter=stats memory 1P statsfile=/dev/stdout --run ' time
guestfish add "" protocol:nbd server:unix:$unixsocket discard:enable format:raw
: run : mkfs xfs /dev/sda '
real 4m17.531s
user 0m0.032s
sys 0m0.040s
total: 1066328 ops, 257.558068 s, 1048578.04 GiB, 4071.23 GiB/s
read: 4356 ops, 0.003335 s, 14.61 MiB, 4.28 GiB/s op, 58.08 KiB/s total
Request size and alignment breakdown:
12 bits: 50.8% (2215 reqs, 8.65 MiB total)
12 bit aligned: 100.0% (2215)
13 bit aligned: 51.6% (1143)
14 bit aligned: 26.9% (595)
15 bit aligned: 14.6% (323)
16 bit aligned: 8.4% (185)
17+ bit-aligned: 4.9% (109)
9 bits: 47.4% (2064 reqs, 1.01 MiB total)
9 bit aligned: 100.0% (2064)
10+ bit-aligned: 0.6% (13)
other sizes: 1.8% (77 reqs, 14.61 MiB total)
write: 13325 ops, 0.046782 s, 31.29 MiB, 668.91 MiB/s op, 124.41 KiB/s total
Request size and alignment breakdown:
12 bits: 53.8% (7170 reqs, 28.01 MiB total)
12 bit aligned: 100.0% (7170)
13 bit aligned: 50.0% (3585)
14 bit aligned: 25.0% (1793)
15 bit aligned: 12.5% (896)
16 bit aligned: 6.2% (448)
17+ bit-aligned: 3.1% (224)
9 bits: 46.2% (6150 reqs, 3.00 MiB total)
9 bit aligned: 100.0% (6150)
10 bit aligned: 33.4% (2054)
12 bit aligned: 16.7% (1029)
13 bit aligned: 8.4% (515)
14+ bit-aligned: 4.2% (259)
other sizes: 0.0% (5 reqs, 31.29 MiB total)
trim: 1048576 ops, 306.059735 s, 1048576.00 GiB, 3426.05 GiB/s op, 4071.22 GiB/s
total
Request size and alignment breakdown:
30 bits: 100.0% (1048576 reqs, 1048576.00 GiB total)
30 bit aligned: 100.0% (1048576)
31 bit aligned: 50.0% (524288)
32 bit aligned: 25.0% (262144)
33 bit aligned: 12.5% (131072)
34 bit aligned: 6.2% (65536)
35+ bit-aligned: 3.1% (32768)
zero: 64 ops, 0.003912 s, 1.99 GiB, 508.75 GiB/s op, 7.91 MiB/s total
Request size and alignment breakdown:
25 bits: 98.4% (63 reqs, 1.97 GiB total)
13 bit aligned: 100.0% (63)
other sizes: 1.6% (1 reqs, 1.99 GiB total)
flush: 7 ops, 0.000001 s, 0 bytes, 0 bytes/s op, 0 bytes/s total
Note how trim takes a million operations and most of the time. That
should be done in one operation. If you stop advertising discard
support on the disk ("discard:disable") it takes only a fraction of
the time.
The extents one is harder to demonstrate, but it makes our code
considerably more complex that we cannot just grab the extent map for
a whole disk larger than 4GB in a single command. (The complexity
won't go away, but the querying will be faster with fewer round trips
with this change.)
Nevertheless I'm not opposed to keeping this as an extension until the
implementations are upstream and bedded in.
Rich.
With that said, for the things I didn't reply to, you can add:
Reviewed-By: Wouter Verhelst <w(a)uter.be>
--
w(a)uter.{be,co.za}
wouter(a){grep.be,fosdem.org,debian.org}
I will have a Tin-Actinium-Potassium mixture, thanks.
_______________________________________________
Libguestfs mailing list
Libguestfs(a)redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines. Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v