Our cbdkit was branched from 1.1.12 with some patches from 1.1.13 added as
well. The project has morphed significantly enough that a direct diff or
merging between the two would not be feasible. Even the structure of the
project directory and build has been changed to be in line with our other
internal projects.
I have uploaded the entire cbdkit source to our github at
The relevant files are
include/cbdkit-plugin.h
src/connections.c
src/plugins.c
Specifically, the connections.c functions
recv_request
send_reply
and the plugins.c functions
plugin_pread
plugin_pwrite
cbdkit_async_reply
cbdkit_async_reply_read
cbdkit_async_reply_error
On Fri, Jan 19, 2018 at 12:05 PM, Eric Blake <eblake(a)redhat.com> wrote:
On 01/19/2018 10:56 AM, Shaun McDowell wrote:
> Limitation: The kernel will (with today's default settings) typically be
> willing to send up to 128 requests of 128kB size to the driver in
parallel.
> We wanted to support 128 parallel read operations on different areas of
the
> disk without requiring 128 separate threads and connections for the
driver.
> Right now in nbdkit that is impossible. The main loop in connection.c
will
> pull an nbd request off the socket and block until that read request is
> complete before sending a response and getting the next request, blocking
> other requests on the socket unless running X connections/threads in
> parallel.
What version nbdkit are you using? We recently added parallel reads in
1.1.17 (although some minor fixes went in later; current version is
1.1.25) that should allow you to have a single socket serving multiple
requests in parallel, in response to your setting of nbdkit's --thread
option, and if your plugin is truly parallel (nbdkit now ships both a
'file' and 'nbd' plugin that are truly parallel).
> Change: We introduced an additional set of functions to the nbdkit_plugin
> struct that supports asynchronous handling of the requests and a few
helper
> functions for the plugin to use to respond when it has finished the
> request. This is very similar to the fuse filesystem low level api (async
> supported) vs the high level fuse fs api (sync only). The design goal
here
> is that a single connection/thread on nbdkit can support as many requests
> in parallel as the plugin allows. The nbdkit side pulls the request off
the
> socket and if the async function pointer is non-null it will wrap the
> request in an op struct and use the async plugin call for read/write/etc
> capturing any buffer allocated and some op details into the op pointer.
The
> plugin async_* will start the op and return to nbdkit while the plugin
> works on it in the background. Nbdkit will then go back to the socket and
> begin the next request. Our plugin uses 1 connection/nbdkit thread and
2-4
> threads internally with boost asio over sockets to service the requests
to
> cloud. We are able to achieve ~1GB/s (yes bytes) read/write performance
to
> aws s3 from an ec2 node with 10 gigabit networking on < 100MB of memory
in
> the driver with this approach.
Definitely post patches to the list! My work to add parallel support
via --threads still spawns multiple threads (the plugin is operating
concurrently on multiple threads) while yours is a different approach of
breaking things into smaller stages that piece together and possible
with fewer threads.
>
> Here are some of what our function prototypes look like that support an
> asynchronous nbdkit model
>
> #define CBDKIT_THREAD_MODEL_SERIALIZE_REQUESTS 2
> #define CBDKIT_THREAD_MODEL_PARALLEL 3
> #define CBDKIT_THREAD_MODEL_ASYNC 4
>
> struct cbdkit_plugin {
> ...
> int (*pread) (void *handle, void *buf, uint32_t count, uint64_t
offset);
> int (*pwrite) (void *handle, const void *buf, uint32_t count, uint64_t
> offset);
> int (*flush) (void *handle);
> int (*trim) (void *handle, uint32_t count, uint64_t offset);
> int (*zero) (void *handle, uint32_t count, uint64_t offset, int
may_trim);
>
> int errno_is_preserved;
>
> void (*async_pread) (void *op, void *handle, void *buf, uint32_t count,
> uint64_t offset);
> void (*async_pwrite) (void *op, void *handle, const void *buf, uint32_t
> count, uint64_t offset, int fua);
> void (*async_flush) (void *op, void *handle);
> void (*async_trim) (void *op, void *handle, uint32_t count, uint64_t
> offset, int fua);
> void (*async_zero) (void *op, void *handle, uint32_t count, uint64_t
> offset, int may_trim, int fua);
> ...
> }
>
> Additionally there are a few helper functions for the plugin to use to
> respond back to nbdkit when the job is eventually finished. The plugin
> contract when using the async functions is that every async func
guarantees
> it will call an appropriate async_reply function.
>
> /* call for completion of successful async_pwrite, async_flush,
> async_trim, or async_zero */
> extern CBDKIT_CXX_LANG_C int cbdkit_async_reply (void *op);
> /* call for complete of successful async_pread */
> extern CBDKIT_CXX_LANG_C int cbdkit_async_reply_read (void *op);
> /* call for completion of any async operation with error */
> extern CBDKIT_CXX_LANG_C int cbdkit_async_reply_error (void *op,
uint32_t
> error);
>
> If there is any interest in supporting async ops in the next api version
I
> am able to share the entire modified nbdkit (cbdkit) source that we use
> that supports this async op framework, fua, as well as some buffer
pooling.
Yes, please post patches.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization:
qemu.org |
libvirt.org