On 01/19/2018 10:56 AM, Shaun McDowell wrote:
> Limitation: The kernel will (with today's default settings) typically be
> willing to send up to 128 requests of 128kB size to the driver in parallel.
> We wanted to support 128 parallel read operations on different areas of the
> disk without requiring 128 separate threads and connections for the driver.
> Right now in nbdkit that is impossible. The main loop in connection.c will
> pull an nbd request off the socket and block until that read request is
> complete before sending a response and getting the next request, blocking
> other requests on the socket unless running X connections/threads in
> parallel.
What version nbdkit are you using? We recently added parallel reads in
1.1.17 (although some minor fixes went in later; current version is
1.1.25) that should allow you to have a single socket serving multiple
requests in parallel, in response to your setting of nbdkit's --thread
option, and if your plugin is truly parallel (nbdkit now ships both a
'file' and 'nbd' plugin that are truly parallel).
> Change: We introduced an additional set of functions to the nbdkit_plugin
> struct that supports asynchronous handling of the requests and a few helper
> functions for the plugin to use to respond when it has finished the
> request. This is very similar to the fuse filesystem low level api (async
> supported) vs the high level fuse fs api (sync only). The design goal here
> is that a single connection/thread on nbdkit can support as many requests
> in parallel as the plugin allows. The nbdkit side pulls the request off the
> socket and if the async function pointer is non-null it will wrap the
> request in an op struct and use the async plugin call for read/write/etc
> capturing any buffer allocated and some op details into the op pointer. The
> plugin async_* will start the op and return to nbdkit while the plugin
> works on it in the background. Nbdkit will then go back to the socket and
> begin the next request. Our plugin uses 1 connection/nbdkit thread and 2-4
> threads internally with boost asio over sockets to service the requests to
> cloud. We are able to achieve ~1GB/s (yes bytes) read/write performance to
> aws s3 from an ec2 node with 10 gigabit networking on < 100MB of memory in
> the driver with this approach.
Definitely post patches to the list! My work to add parallel support
via --threads still spawns multiple threads (the plugin is operating
concurrently on multiple threads) while yours is a different approach of
breaking things into smaller stages that piece together and possible
with fewer threads.
Yes, please post patches.
>
> Here are some of what our function prototypes look like that support an
> asynchronous nbdkit model
>
> #define CBDKIT_THREAD_MODEL_SERIALIZE_REQUESTS 2
> #define CBDKIT_THREAD_MODEL_PARALLEL 3
> #define CBDKIT_THREAD_MODEL_ASYNC 4
>
> struct cbdkit_plugin {
> ...
> int (*pread) (void *handle, void *buf, uint32_t count, uint64_t offset);
> int (*pwrite) (void *handle, const void *buf, uint32_t count, uint64_t
> offset);
> int (*flush) (void *handle);
> int (*trim) (void *handle, uint32_t count, uint64_t offset);
> int (*zero) (void *handle, uint32_t count, uint64_t offset, int may_trim);
>
> int errno_is_preserved;
>
> void (*async_pread) (void *op, void *handle, void *buf, uint32_t count,
> uint64_t offset);
> void (*async_pwrite) (void *op, void *handle, const void *buf, uint32_t
> count, uint64_t offset, int fua);
> void (*async_flush) (void *op, void *handle);
> void (*async_trim) (void *op, void *handle, uint32_t count, uint64_t
> offset, int fua);
> void (*async_zero) (void *op, void *handle, uint32_t count, uint64_t
> offset, int may_trim, int fua);
> ...
> }
>
> Additionally there are a few helper functions for the plugin to use to
> respond back to nbdkit when the job is eventually finished. The plugin
> contract when using the async functions is that every async func guarantees
> it will call an appropriate async_reply function.
>
> /* call for completion of successful async_pwrite, async_flush,
> async_trim, or async_zero */
> extern CBDKIT_CXX_LANG_C int cbdkit_async_reply (void *op);
> /* call for complete of successful async_pread */
> extern CBDKIT_CXX_LANG_C int cbdkit_async_reply_read (void *op);
> /* call for completion of any async operation with error */
> extern CBDKIT_CXX_LANG_C int cbdkit_async_reply_error (void *op, uint32_t
> error);
>
> If there is any interest in supporting async ops in the next api version I
> am able to share the entire modified nbdkit (cbdkit) source that we use
> that supports this async op framework, fua, as well as some buffer pooling.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org