Hello,
I am trying to improve performance of the scenario where the kernel's
NBD client talks to NBDKit's S3 plugin.
For me, the main bottleneck is currently due to the fact that the kernel
aligns requests to only 512 B, no matter the blocksize reported by
nbdkit.
Using a 512 B object size is not feasible (due to latency and request
overhead). However, with a larger object size there are two conflicting
objectives:
1. To maximize parallelism (which is important to reduce the effects of
connection latency), it's best to limit the size of the kernel's NBD
requests to the object size.
2. To minimize un-aligned writes, it's best to allow arbitrarily large
NBD requests, because the larger the requests the larger the amount of
full blocks that are written. Unfortunately this means that all objects
touched by the request are written sequentially.
I see a number of ways to address that:
1. Change the kernel's NBD code to honor the blocksize reported by the
NBD server. This would be ideal, but I don't feel up to making this
happen. Theoretical solution only.
2. Change the S3 plugin to use multiple threads, so that it can upload
multiple objects in parallel even when they're part of the same NBD
request. The disadvantage is that this adds a second "layer" of
threads, in addition to those started by NBDkit itself.
3. Change NBDkit itself to split up requests *and* distribute them to
multiple threads. I believe this means changes to the core code
because the blocksize filter can't dispatch requests to multiple
threads.
What do people think is the best way to proceed? Is there a fourth
option that I might be missing?
Best,
-Nikolaus
--
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
»Time flies like an arrow, fruit flies like a Banana.«