On Mon, Jun 13, 2022 at 01:25:39PM -0400, Josef Bacik wrote:
On Mon, Jun 13, 2022 at 6:24 AM Richard W.M. Jones
<rjones(a)redhat.com> wrote:
>
> On Mon, Jun 13, 2022 at 10:33:58AM +0100, Nikolaus Rath wrote:
> > Hello,
> >
> > I am trying to improve performance of the scenario where the kernel's
> > NBD client talks to NBDKit's S3 plugin.
> >
> > For me, the main bottleneck is currently due to the fact that the kernel
> > aligns requests to only 512 B, no matter the blocksize reported by
> > nbdkit.
> >
> > Using a 512 B object size is not feasible (due to latency and request
> > overhead). However, with a larger object size there are two conflicting
> > objectives:
> >
> > 1. To maximize parallelism (which is important to reduce the effects of
> > connection latency), it's best to limit the size of the kernel's NBD
> > requests to the object size.
> >
> > 2. To minimize un-aligned writes, it's best to allow arbitrarily large
> > NBD requests, because the larger the requests the larger the amount of
> > full blocks that are written. Unfortunately this means that all objects
> > touched by the request are written sequentially.
> >
> > I see a number of ways to address that:
> >
> > 1. Change the kernel's NBD code to honor the blocksize reported by the
> > NBD server. This would be ideal, but I don't feel up to making this
> > happen. Theoretical solution only.
>
> This would be the ideal solution. I wonder how technically
> complicated it would be actually?
>
> AIUI you'd have to modify nbd-client to query the block limits from
> the server, which is the hardest part of this, but it's all userspace
> code. Then you'd pass those down to the kernel via the ioctl (see
> drivers/block/nbd.c:__nbd_ioctl). Then inside the kernel you'd call
> blk_queue_io_min & blk_queue_io_opt with the values (I'm not sure how
> you set the max request size, or if that's possible). See
> block/blk-settings.c for details of these functions.
>
Exactly this. The kernel just does what the client tells it to do,
and the kernel can be configured for whatever blocksize.
Unfortunately there's not a way for the server to advertise to the
client what to do, you have to configure it on the client. Adding
some code to userspace negotiation that happens is the right thing to
do here to pull the blocksize, and then simply pass this into the
configuration stuff in the nbd-client and it uses the appropriate
netlink tag to set the blocksize.
For context, the NBD protocol can now advertise during the initial
handshake, minimum, preferred and maximum block sizes:
https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md#block-...
nbdkit (since 1.30) supports this, for example:
$ nbdkit eval get_size='echo 256M' block_size='echo 64k 1M 32M'
$ nbdinfo nbd://localhost
protocol: newstyle-fixed without TLS
export="":
export-size: 268435456 (256M)
uri: nbd://localhost:10809/
contexts:
base:allocation
is_rotational: false
is_read_only: true
can_cache: false
can_df: true
can_fast_zero: false
can_flush: false
can_fua: false
can_multi_conn: false
can_trim: false
can_zero: false
block_size_minimum: 65536 <---
block_size_preferred: 1048576 <---
block_size_maximum: 33554432 <---
Rich.
> As a quick test you could try calling blk_queue_io_* in the
kernel
> driver with hard-coded values, to see if that modifies the requests
> that are seen by nbdkit. Should give you some confidence before
> making the full change.
>
> BTW I notice that the kernel NBD driver always reports that it's a
> non-rotational device, ignoring the server setting ...
That I can fix easily, I'll get that done. Thanks,
Josef
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html