On Tue, Jun 14, 2022 at 08:30:15PM +0100, Nikolaus Rath wrote:
On Jun 14 2022, "Richard W.M. Jones"
<rjones(a)redhat.com> wrote:
> I think we should set logical_block_size == physical_block_size ==
> MAX (512, NBD minimum block size constraint).
Why the lower bound of 512?
I suspect the kernel can't handle sector sizes smaller than 512 bytes.
By default the NBD protocol advises advertising a minimum size of 1
byte, and I'm almost certain setting logical_block_size == 1 would
break everything.
> What should happen to the nbd-client -b option?
Perhaps it should become the lower-bound (instead of the hardcoded 512)?
That's assuming there is a reason for having a client-specified lower
bound.
Right, I don't think there's a reason to continue with the -b option.
I only use it to set -b 512 to work around the annoying default in
older versions (which was 1024).
> (4) Kernel blk_queue_max_hw_sectors: This is documented as:
"set max
> sectors for a request ... Enables a low level driver to set a hard
> upper limit, max_hw_sectors, on the size of requests."
>
> Current behaviour of nbd.ko is that we set this to 65536 (sectors?
> blocks?), which for 512b sectors is 32M.
FWIW, on my 5.16 kernel, the default is 65 kB (according to
/sys/block/nbdX/queue/max_sectors_kb x 512b).
I have:
$ cat /sys/devices/virtual/block/nbd0/queue/max_hw_sectors_kb
32768
(ie. 32 MB) which I think comes from the nbd module setting:
blk_queue_max_hw_sectors(disk->queue, 65536);
multiplied by 512b sectors.
> I think we could set this to MIN (32M, NBD maximum block size
constraint),
> converting the result to sectors.
I don't think that's right. Rather, it should be NBD's preferred block
size.
Setting this to the preferred block size means that NBD requests will be
this large whenever there are enough sequential dirty pages, and that no
requests will ever be larger than this. I think this is exactly what the
NBD server would like to have.
This kernel setting limits the maximum request size on the queue.
In my testing reading and writing files with the default [above] the
kernel never got anywhere near sending multi-megabyte requests. In
fact the largest request it sent was 128K, even when I did stuff like:
# dd if=/dev/zero of=/tmp/mnt/zero bs=100M count=10
128K happens to be 2 x blk_queue_io_opt, but I need to do more testing
to see if that relationship always holds.
Settings this to the maximum block size would mean that NBD requests
will exceed the preferred size whenever there are enough sequential
dirty pages (while still obeying the maximum). This seems strictly
worse.
Unrelated to the proposed changes (all of which I think are technically
correct), I am wondering if this will have much practical benefits. As
far as I can tell, the kernel currently aligns NBD requests to the
logical/physical block size rather than the size of the NBD request. Are
there NBD servers that would benefit from the kernel honoring the
preferred blocksize if the data is not also aligned to this blocksize?
I'm not sure I parsed this. Can you give an example?
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
nbdkit - Flexible, fast NBD server with plugins
https://gitlab.com/nbdkit/nbdkit