On Jun 14 2022, "Richard W.M. Jones" <rjones(a)redhat.com> wrote:
This is a follow-up to this thread:
https://listman.redhat.com/archives/libguestfs/2022-June/thread.html#29210
about getting the kernel client (nbd.ko) to obey block size
constraints sent by the NBD server:
https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md#block-...
I was sent this very interesting design document about the original
intent behind the kernel's I/O limits:
https://people.redhat.com/msnitzer/docs/io-limits.txt
There are four or five kernel block layer settings we could usefully
adjust, and there are three NBD block size constraints, and in my
opinion there's not a very clear mapping between them. But I'll have
a go at what I think we should do.
- - -
(1) Kernel physical_block_size & logical_block_size: The example given
is of a hard disk with 4K physical sectors (AF) which can nevertheless
emulate 512-byte sectors. In this case you'd set physical_block_size
= 4K, logical_block_size = 512b.
Data structures (partition tables, etc) should be aligned to
physical_block_size to avoid unnecessary RMW cycles. But the
fundamental until of I/O is logical_block_size.
Current behaviour of nbd.ko is that logical_block_size ==
physical_block_size == the nbd-client "-b" option (default: 512 bytes,
contradicting the documentation).
I think we should set logical_block_size == physical_block_size ==
MAX (512, NBD minimum block size constraint).
Why the lower bound of 512?
What should happen to the nbd-client -b option?
Perhaps it should become the lower-bound (instead of the hardcoded 512)?
That's assuming there is a reason for having a client-specified lower
bound.
(2) Kernel minimum_io_size: The documentation says this is the
"preferred minimum unit for random I/O".
Current behaviour of nbd.ko is this is not set.
I think the NBD's preferred block size should map to minimum_io_size.
(3) Kernel optimal_io_size: The documentation says this is the
"[preferred] streaming I/O [size]".
Current behaviour of nbd.ko is this is not set.
NBD doesn't really have the concept of streaming vs random I/O, so we
could either ignore this or set it to the same value as
minimum_io_size.
I have a kernel patch allowing nbd-client to set both minimum_io_size
and optimal_io_size from userspace.
(4) Kernel blk_queue_max_hw_sectors: This is documented as: "set max
sectors for a request ... Enables a low level driver to set a hard
upper limit, max_hw_sectors, on the size of requests."
Current behaviour of nbd.ko is that we set this to 65536 (sectors?
blocks?), which for 512b sectors is 32M.
FWIW, on my 5.16 kernel, the default is 65 kB (according to
/sys/block/nbdX/queue/max_sectors_kb x 512b).
I think we could set this to MIN (32M, NBD maximum block size
constraint),
converting the result to sectors.
I don't think that's right. Rather, it should be NBD's preferred block
size.
Setting this to the preferred block size means that NBD requests will be
this large whenever there are enough sequential dirty pages, and that no
requests will ever be larger than this. I think this is exactly what the
NBD server would like to have.
Settings this to the maximum block size would mean that NBD requests
will exceed the preferred size whenever there are enough sequential
dirty pages (while still obeying the maximum). This seems strictly
worse.
Unrelated to the proposed changes (all of which I think are technically
correct), I am wondering if this will have much practical benefits. As
far as I can tell, the kernel currently aligns NBD requests to the
logical/physical block size rather than the size of the NBD request. Are
there NBD servers that would benefit from the kernel honoring the
preferred blocksize if the data is not also aligned to this blocksize?
Best,
-Nikolaus
--
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
»Time flies like an arrow, fruit flies like a Banana.«