Implement a TODO item of emulating multi-connection consistency via
multiple plugin flush calls to allow a client to assume that a flush
on a single connection is good enough. This also gives us some
fine-tuning over whether to advertise the bit, including some setups
that are unsafe but may be useful in timing tests.
Testing is interesting: I used the sh plugin to implement a server
that intentionally keeps a per-connection cache.
Note that this filter assumes that multiple connections will still
share the same data (other than caching effects); effects are not
guaranteed when trying to mix it with more exotic plugins like info
that violate that premise.
---
I'm still working on the test; the sh plugin is good enough that it
does what I want when playing with it manually, but I still need to
write up various scenarios in test-multi-conn.sh to match what I've
played with manually.
I'm open to feedback on the set of options I've exposed during .config
(too many, not enough, better names?) Right now, it is:
multi-conn-mode=auto|plugin|disable|emulate|unsafe
multi-conn-track-dirty=fast|connection|off
.../multi-conn/nbdkit-multi-conn-filter.pod | 169 +++++++
configure.ac | 4 +-
filters/multi-conn/Makefile.am | 68 +++
tests/Makefile.am | 11 +-
filters/multi-conn/multi-conn.c | 467 ++++++++++++++++++
tests/test-multi-conn-plugin.sh | 121 +++++
tests/test-multi-conn.sh | 85 ++++
TODO | 7 -
8 files changed, 923 insertions(+), 9 deletions(-)
create mode 100644 filters/multi-conn/nbdkit-multi-conn-filter.pod
create mode 100644 filters/multi-conn/Makefile.am
create mode 100644 filters/multi-conn/multi-conn.c
create mode 100755 tests/test-multi-conn-plugin.sh
create mode 100755 tests/test-multi-conn.sh
diff --git a/filters/multi-conn/nbdkit-multi-conn-filter.pod
b/filters/multi-conn/nbdkit-multi-conn-filter.pod
new file mode 100644
index 00000000..ae2873df
--- /dev/null
+++ b/filters/multi-conn/nbdkit-multi-conn-filter.pod
@@ -0,0 +1,169 @@
+=head1 NAME
+
+nbdkit-multi-conn-filter - nbdkit multi-conn filter
+
+=head1 SYNOPSIS
+
+ nbdkit --filter=multi-conn plugin \
+ [multi-conn-mode=MODE] [multi-conn-track-dirty=LEVEL] [plugin-args...]
+
+=head1 DESCRIPTION
+
+C<nbdkit-multi-conn-filter> is a filter that enables alterations to
+the server's advertisement of NBD_FLAG_MULTI_CONN. When a server
+permits multiple simultaneous clients, and sets this flag, a client
+may assume that all connections see a consistent view (after getting
+the server reply from a write in one connection, sending a flush
+command on a single connection and waiting for that reply then
+guarantees that all connections will then see the just-written data).
+If the flag is not advertised, a client must presume that separate
+connections may have utilized independent caches, and where a flush on
+one connection does not affect the cache of a second connection.
+
+The main use of this filter is to emulate consistent semantics across
+multiple connections when not already provided by a plugin, although
+it also has additional modes useful for evaluating performance and
+correctness of client and plugin multi-conn behaviors. This filter
+assumes that multiple connections to a plugin will eventually share
+data, other than any caching effects; it is not suitable for use with
+a plugin that produces completely independent data per connection.
+
+Additional control over the behavior of client flush commands is
+possible by combining this filter with L<nbdkit-fua-filter(1)>.
+
+=head1 PARAMETERS
+
+=over 4
+
+=item B<multi-conn-mode=auto>
+
+This filter defaults to B<auto> mode. If the plugin advertises
+multi-conn, then this filter behaves the same as B<plugin> mode;
+otherwise, this filter behaves the same as B<emulate> mode. Either
+way, this mode advertises NBD_FLAG_MULTI_CONN to the client.
+
+=item B<multi-conn-mode=emulate>
+
+When B<emulate> mode is chosen, then this filter tracks all parallel
+connections. When a client issues a flush command over any one
+connection (including a write command with the FUA (force unit access)
+flag set), the filter then replicates that flush across each
+connection to the plugin (although the amount of plugin calls can be
+tuned by adjusting B<multi-conn-track-dirty>). This assumes that
+flushing each connection is enough to clear any per-connection cached
+data, in order to give each connection a consistent view of the image;
+therefore, this mode advertises NBD_FLAG_MULTI_CONN to the client.
+
+Note that in this mode, a client will be unable to connect if the
+plugin lacks support for flush, as there would be no way to emulate
+cross-connection consistency.
+
+=item B<multi-conn-mode=disable>
+
+When B<disable> mode is chosen, this filter disables advertisement of
+NBD_FLAG_MULTI_CONN to the client, even if the plugin supports it, and
+does not replicate flush commands across connections. This is useful
+for testing whether a client with multiple connections properly sends
+multiple flushes in order to overcome per-connection caching.
+
+=item B<multi-conn-mode=plugin>
+
+When B<plugin> mode is chosen, the filter does not change whether
+NBD_FLAG_MULTI_CONN is advertised by the plugin, and does not
+replicate flush commands across connections; but still honors
+B<multi-conn-track-dirty> for minimizing the number of flush commands
+passed on to the plugin.
+
+=item B<multi-conn-mode=unsafe>
+
+When B<unsafe> mode is chosen, this filter blindly advertises
+NBD_FLAG_MULTI_CONN to the client even if the plugin lacks support.
+This is dangerous, and risks data corruption if the client makes
+assumptions about data consistency that were not actually met.
+
+=item B<multi-conn-track-dirty=fast>
+
+When dirty tracking is set to B<fast>, the filter tracks whether any
+connection has caused the image to be dirty (any write, zero, or trim
+commands since the last flush, regardless of connection); if all
+connections are clean, a client flush command is ignored rather than
+sent on to the plugin. In this mode, a flush action on one connection
+marks all other connections as clean, regardless of whether the filter
+actually advertised NBD_FLAG_MULTI_CONN, which can result in less
+activity when a client sends multiple flushes rather than taking
+advantage of multi-conn semantics. This is safe with
+B<multi-conn-mode=emulate>, but potentially unsafe with
+B<multi-conn-mode=plugin> when the plugin did not advertise
+multi-conn, as it does not track whether a read may have cached stale
+data prior to a flush.
+
+=item B<multi-conn-track-dirty=connection>
+
+Dirty tracking is set to B<connection> by default, where the filter
+tracks whether a given connection is dirty (any write, zero, or trim
+commands since the last flush on the given connection, and any read
+since the last flush on any other connection); if the connection is
+clean, a flush command to that connection (whether directly from the
+client, or replicated by B<multi-conn-mode=emulate> is ignored rather
+than sent on to the plugin. This mode may result in more flush calls
+than B<multi-conn-track-dirty=fast>, but in turn is safe to use with
+B<multi-conn-mode=plugin>.
+
+=item B<multi-conn-track-dirty=off>
+
+When dirty tracking is set to B<off>, all flush commands from the
+client are passed on to the plugin, regardless of whether the flush
+would be needed for consistency. Note that when combined with
+B<multi-conn-mode=emulate>, a client which disregards
+NBD_FLAG_MULTI_CONN by flushing on each connection itself results in a
+quadratic number of flush operations on the plugin.
+
+=back
+
+=head1 EXAMPLES
+
+Provide consistent cross-connection flush semantics on top of a plugin
+that lacks it natively:
+
+ nbdkit --filter=multi-conn split file.part1 file.part2
+
+Minimize the number of expensive flush operations performed when
+utilizing a plugin that has multi-conn consistency from a client that
+blindly flushes across every connection:
+
+ nbdkit --filter=multi-conn file multi-conn-mode=plugin \
+ multi-conn-track-dirty=fast disk.img
+
+=head1 FILES
+
+=over 4
+
+=item F<$filterdir/nbdkit-multi-conn-filter.so>
+
+The filter.
+
+Use C<nbdkit --dump-config> to find the location of C<$filterdir>.
+
+=back
+
+=head1 VERSION
+
+C<nbdkit-multi-conn-filter> first appeared in nbdkit 1.26.
+
+=head1 SEE ALSO
+
+L<nbdkit(1)>,
+L<nbdkit-file-plugin(1)>,
+L<nbdkit-filter(3)>,
+L<nbdkit-fua-filter(1)>,
+L<nbdkit-nocache-filter(1)>,
+L<nbdkit-noextents-filter(1)>,
+L<nbdkit-nozero-filter(1)>.
+
+=head1 AUTHORS
+
+Eric Blake
+
+=head1 COPYRIGHT
+
+Copyright (C) 2018-2021 Red Hat Inc.
diff --git a/configure.ac b/configure.ac
index cb18dd88..2b3e214e 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1,5 +1,5 @@
# nbdkit
-# Copyright (C) 2013-2020 Red Hat Inc.
+# Copyright (C) 2013-2021 Red Hat Inc.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
@@ -128,6 +128,7 @@ filters="\
ip \
limit \
log \
+ multi-conn \
nocache \
noextents \
nofilter \
@@ -1259,6 +1260,7 @@ AC_CONFIG_FILES([Makefile
filters/ip/Makefile
filters/limit/Makefile
filters/log/Makefile
+ filters/multi-conn/Makefile
filters/nocache/Makefile
filters/noextents/Makefile
filters/nofilter/Makefile
diff --git a/filters/multi-conn/Makefile.am b/filters/multi-conn/Makefile.am
new file mode 100644
index 00000000..778b8947
--- /dev/null
+++ b/filters/multi-conn/Makefile.am
@@ -0,0 +1,68 @@
+# nbdkit
+# Copyright (C) 2021 Red Hat Inc.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+#
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# * Neither the name of Red Hat nor the names of its contributors may be
+# used to endorse or promote products derived from this software without
+# specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+# SUCH DAMAGE.
+
+include $(top_srcdir)/common-rules.mk
+
+EXTRA_DIST = nbdkit-multi-conn-filter.pod
+
+filter_LTLIBRARIES = nbdkit-multi-conn-filter.la
+
+nbdkit_multi_conn_filter_la_SOURCES = \
+ multi-conn.c \
+ $(top_srcdir)/include/nbdkit-filter.h \
+ $(NULL)
+
+nbdkit_multi_conn_filter_la_CPPFLAGS = \
+ -I$(top_srcdir)/include \
+ -I$(top_srcdir)/common/include \
+ -I$(top_srcdir)/common/utils \
+ $(NULL)
+nbdkit_multi_conn_filter_la_CFLAGS = $(WARNINGS_CFLAGS)
+nbdkit_multi_conn_filter_la_LIBADD = \
+ $(top_builddir)/common/utils/libutils.la \
+ $(IMPORT_LIBRARY_ON_WINDOWS) \
+ $(NULL)
+nbdkit_multi_conn_filter_la_LDFLAGS = \
+ -module -avoid-version -shared $(NO_UNDEFINED_ON_WINDOWS) \
+ -Wl,--version-script=$(top_srcdir)/filters/filters.syms \
+ $(NULL)
+
+if HAVE_POD
+
+man_MANS = nbdkit-multi-conn-filter.1
+CLEANFILES += $(man_MANS)
+
+nbdkit-multi-conn-filter.1: nbdkit-multi-conn-filter.pod
+ $(PODWRAPPER) --section=1 --man $@ \
+ --html $(top_builddir)/html/$@.html \
+ $<
+
+endif HAVE_POD
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 70898f20..4b3ee65c 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -1,5 +1,5 @@
# nbdkit
-# Copyright (C) 2013-2020 Red Hat Inc.
+# Copyright (C) 2013-2021 Red Hat Inc.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
@@ -1538,6 +1538,15 @@ EXTRA_DIST += \
test-log-script-info.sh \
$(NULL)
+# multi-conn filter test.
+TESTS += \
+ test-multi-conn.sh \
+ $(NULL)
+EXTRA_DIST += \
+ test-multi-conn-plugin.sh \
+ test-multi-conn.sh \
+ $(NULL)
+
# nofilter test.
TESTS += test-nofilter.sh
EXTRA_DIST += test-nofilter.sh
diff --git a/filters/multi-conn/multi-conn.c b/filters/multi-conn/multi-conn.c
new file mode 100644
index 00000000..3b244cb7
--- /dev/null
+++ b/filters/multi-conn/multi-conn.c
@@ -0,0 +1,467 @@
+/* nbdkit
+ * Copyright (C) 2021 Red Hat Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ *
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ *
+ * * Neither the name of Red Hat nor the names of its contributors may be
+ * used to endorse or promote products derived from this software without
+ * specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+ * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+ * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+ * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+ * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+ * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include <config.h>
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <string.h>
+#include <stdbool.h>
+#include <assert.h>
+#include <pthread.h>
+
+#include <nbdkit-filter.h>
+
+#include "cleanup.h"
+#include "vector.h"
+
+/* Track results of .config */
+static enum MultiConnMode {
+ AUTO,
+ EMULATE,
+ PLUGIN,
+ UNSAFE,
+ DISABLE,
+} mode;
+
+static enum TrackDirtyMode {
+ CONN,
+ FAST,
+ OFF,
+} track;
+
+enum dirty {
+ WRITE = 1, /* A write may have populated a cache */
+ READ = 2, /* A read may have populated a cache */
+};
+
+/* Coordination between connections. */
+static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
+
+/* The list of handles to active connections. */
+struct handle {
+ struct nbdkit_next_ops *next_ops;
+ void *nxdata;
+ enum MultiConnMode mode; /* Runtime resolution of mode==AUTO */
+ enum dirty dirty; /* What aspects of this connection are dirty */
+};
+DEFINE_VECTOR_TYPE(conns_vector, struct handle *);
+static conns_vector conns = empty_vector;
+static bool dirty; /* True if any connection is dirty */
+
+/* Accept 'multi-conn-mode=mode' and 'multi-conn-track-dirty=level' */
+static int
+multi_conn_config (nbdkit_next_config *next, void *nxdata,
+ const char *key, const char *value)
+{
+ if (strcmp (key, "multi-conn-mode") == 0) {
+ if (strcmp (value, "auto") == 0)
+ mode = AUTO;
+ else if (strcmp (value, "emulate") == 0)
+ mode = EMULATE;
+ else if (strcmp (value, "plugin") == 0)
+ mode = PLUGIN;
+ else if (strcmp (value, "disable") == 0)
+ mode = DISABLE;
+ else if (strcmp (value, "unsafe") == 0)
+ mode = UNSAFE;
+ else {
+ nbdkit_error ("unknown multi-conn mode '%s'", value);
+ return -1;
+ }
+ return 0;
+ }
+ else if (strcmp (key, "multi-conn-track-dirty") == 0) {
+ if (strcmp (value, "connection") == 0 ||
+ strcmp (value, "conn") == 0)
+ track = CONN;
+ else if (strcmp (value, "fast") == 0)
+ track = FAST;
+ else if (strcmp (value, "off") == 0)
+ track = OFF;
+ else {
+ nbdkit_error ("unknown multi-conn track-dirty setting '%s'",
value);
+ return -1;
+ }
+ return 0;
+ }
+ return next (nxdata, key, value);
+}
+
+#define multi_conn_config_help \
+ "multi-conn-mode=<MODE> 'emulate' (default),
'plugin', 'disable',\n" \
+ " or 'unsafe'.\n" \
+ "multi-conn-track-dirty=<LEVEL> 'conn' (default), 'fast',
or 'off'.\n"
+
+static void *
+multi_conn_open (nbdkit_next_open *next, void *nxdata,
+ int readonly, const char *exportname, int is_tls)
+{
+ struct handle *h;
+
+ if (next (nxdata, readonly, exportname) == -1)
+ return NULL;
+
+ /* Allocate here, but populate and insert into list in .prepare */
+ h = calloc (1, sizeof *h);
+ if (h == NULL) {
+ nbdkit_error ("calloc: %m");
+ return NULL;
+ }
+ return h;
+}
+
+static int
+multi_conn_prepare (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle, int readonly)
+{
+ struct handle *h = handle;
+ int r;
+
+ h->next_ops = next_ops;
+ h->nxdata = nxdata;
+ if (mode == AUTO) {
+ r = next_ops->can_multi_conn (nxdata);
+ if (r == -1)
+ return -1;
+ if (r == 0)
+ h->mode = EMULATE;
+ else
+ h->mode = PLUGIN;
+ }
+ else
+ h->mode = mode;
+ if (h->mode == EMULATE && next_ops->can_flush (nxdata) != 1) {
+ nbdkit_error ("emulating multi-conn requires working flush");
+ return -1;
+ }
+
+ ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock);
+ conns_vector_append (&conns, h);
+ return 0;
+}
+
+static int
+multi_conn_finalize (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle)
+{
+ struct handle *h = handle;
+
+ ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock);
+ assert (h->next_ops == next_ops);
+ assert (h->nxdata == nxdata);
+
+ /* XXX should we add a config param to flush if the client forgot? */
+ for (size_t i = 0; i < conns.size; i++) {
+ if (conns.ptr[i] == h) {
+ conns_vector_remove (&conns, i);
+ break;
+ }
+ }
+ return 0;
+}
+
+static void
+multi_conn_close (void *handle)
+{
+ free (handle);
+}
+
+static int
+multi_conn_can_fua (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle)
+{
+ /* If the backend has native FUA support but is not multi-conn
+ * consistent, and we have to flush on every connection, then we are
+ * better off advertising emulated fua rather than native.
+ */
+ struct handle *h = handle;
+ int fua = next_ops->can_fua (nxdata);
+
+ assert (h->mode != AUTO);
+ if (fua == NBDKIT_FUA_NATIVE && h->mode == EMULATE)
+ return NBDKIT_FUA_EMULATE;
+ return fua;
+}
+
+static int
+multi_conn_can_multi_conn (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle)
+{
+ struct handle *h = handle;
+
+ switch (h->mode) {
+ case EMULATE:
+ return 1;
+ case PLUGIN:
+ return next_ops->can_multi_conn (nxdata);
+ case DISABLE:
+ return 0;
+ case UNSAFE:
+ return 1;
+ case AUTO: /* Not possible, see .prepare */
+ default:
+ abort ();
+ }
+}
+
+static void
+mark_dirty (struct handle *h, bool is_read)
+{
+ /* No need to grab lock here: the NBD spec is clear that a client
+ * must wait for the response to a flush before sending the next
+ * command that expects to see the result of that flush, so any race
+ * in accessing dirty can be traced back to the client improperly
+ * sending a flush in parallel with other live commands.
+ */
+ switch (track) {
+ case CONN:
+ h->dirty |= is_read ? READ : WRITE;
+ break;
+ case FAST:
+ if (!is_read)
+ dirty = true;
+ break;
+ case OFF:
+ break;
+ default:
+ abort ();
+ }
+}
+
+static int
+multi_conn_flush (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle, uint32_t flags, int *err);
+
+static int
+multi_conn_pread (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle, void *buf, uint32_t count, uint64_t offs,
+ uint32_t flags, int *err)
+{
+ struct handle *h = handle;
+
+ mark_dirty (h, true);
+ return next_ops->pread (nxdata, buf, count, offs, flags, err);
+}
+
+static int
+multi_conn_pwrite (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle, const void *buf, uint32_t count,
+ uint64_t offs, uint32_t flags, int *err)
+{
+ struct handle *h = handle;
+ bool need_flush = false;
+
+ if (flags & NBDKIT_FLAG_FUA) {
+ if (h->mode == EMULATE) {
+ mark_dirty (h, false);
+ need_flush = true;
+ flags &= ~NBDKIT_FLAG_FUA;
+ }
+ }
+ else
+ mark_dirty (h, false);
+
+ if (next_ops->pwrite (nxdata, buf, count, offs, flags, err) == -1)
+ return -1;
+ if (need_flush)
+ return multi_conn_flush (next_ops, nxdata, h, 0, err);
+ return 0;
+}
+
+static int
+multi_conn_zero (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle, uint32_t count, uint64_t offs, uint32_t flags,
+ int *err)
+{
+ struct handle *h = handle;
+ bool need_flush = false;
+
+ if (flags & NBDKIT_FLAG_FUA) {
+ if (h->mode == EMULATE) {
+ mark_dirty (h, false);
+ need_flush = true;
+ flags &= ~NBDKIT_FLAG_FUA;
+ }
+ }
+ else
+ mark_dirty (h, false);
+
+ if (next_ops->zero (nxdata, count, offs, flags, err) == -1)
+ return -1;
+ if (need_flush)
+ return multi_conn_flush (next_ops, nxdata, h, 0, err);
+ return 0;
+}
+
+static int
+multi_conn_trim (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle, uint32_t count, uint64_t offs, uint32_t flags,
+ int *err)
+{
+ struct handle *h = handle;
+ bool need_flush = false;
+
+ if (flags & NBDKIT_FLAG_FUA) {
+ if (h->mode == EMULATE) {
+ mark_dirty (h, false);
+ need_flush = true;
+ flags &= ~NBDKIT_FLAG_FUA;
+ }
+ }
+ else
+ mark_dirty (h, false);
+
+ if (next_ops->trim (nxdata, count, offs, flags, err) == -1)
+ return -1;
+ if (need_flush)
+ return multi_conn_flush (next_ops, nxdata, h, 0, err);
+ return 0;
+}
+
+static int
+multi_conn_cache (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle, uint32_t count, uint64_t offs, uint32_t flags,
+ int *err)
+{
+ struct handle *h = handle;
+
+ mark_dirty (h, true);
+ return next_ops->cache (nxdata, count, offs, flags, err);
+}
+
+static int
+multi_conn_flush (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle, uint32_t flags, int *err)
+{
+ struct handle *h = handle, *h2;
+ size_t i;
+
+ if (h->mode == EMULATE) {
+ /* Optimize for the common case of a single connection: flush all
+ * writes on other connections, then flush both read and write on
+ * the current connection, then finally flush all other
+ * connections to avoid reads seeing stale data, skipping the
+ * flushes that make no difference according to dirty tracking.
+ */
+ bool updated = h->dirty & WRITE;
+
+ ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock);
+ for (i = 0; i < conns.size; i++) {
+ h2 = conns.ptr[i];
+ if (h == h2)
+ continue;
+ if (dirty || h2->dirty & WRITE) {
+ if (h2->next_ops->flush (h2->nxdata, flags, err) == -1)
+ return -1;
+ h2->dirty &= ~WRITE;
+ updated = true;
+ }
+ }
+ if (dirty || updated) {
+ if (next_ops->flush (nxdata, flags, err) == -1)
+ return -1;
+ }
+ h->dirty = 0;
+ dirty = false;
+ for (i = 0; i < conns.size; i++) {
+ h2 = conns.ptr[i];
+ if (updated && h2->dirty & READ) {
+ assert (h != h2);
+ if (h2->next_ops->flush (h2->nxdata, flags, err) == -1)
+ return -1;
+ }
+ h2->dirty &= ~READ;
+ }
+ }
+ else {
+ /* !EMULATE: Check if the image is clean, allowing us to skip a flush. */
+ switch (track) {
+ case CONN:
+ if (!h->dirty)
+ return 0;
+ break;
+ case FAST:
+ if (!dirty)
+ return 0;
+ break;
+ case OFF:
+ break;
+ default:
+ abort ();
+ }
+ /* Perform the flush, then update dirty tracking. */
+ if (next_ops->flush (nxdata, flags, err) == -1)
+ return -1;
+ switch (track) {
+ case CONN:
+ if (next_ops->can_multi_conn (nxdata) == 1) {
+ ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock);
+ for (i = 0; i < conns.size; i++)
+ conns.ptr[i]->dirty = 0;
+ }
+ else
+ h->dirty = 0;
+ break;
+ case FAST:
+ dirty = false;
+ break;
+ case OFF:
+ break;
+ default:
+ abort ();
+ }
+ }
+ return 0;
+}
+
+static struct nbdkit_filter filter = {
+ .name = "multi-conn",
+ .longname = "nbdkit multi-conn filter",
+ .config = multi_conn_config,
+ .config_help = multi_conn_config_help,
+ .open = multi_conn_open,
+ .prepare = multi_conn_prepare,
+ .finalize = multi_conn_finalize,
+ .close = multi_conn_close,
+ .can_fua = multi_conn_can_fua,
+ .can_multi_conn = multi_conn_can_multi_conn,
+ .pread = multi_conn_pread,
+ .pwrite = multi_conn_pwrite,
+ .trim = multi_conn_trim,
+ .zero = multi_conn_zero,
+ .cache = multi_conn_cache,
+ .flush = multi_conn_flush,
+};
+
+NBDKIT_REGISTER_FILTER(filter)
diff --git a/tests/test-multi-conn-plugin.sh b/tests/test-multi-conn-plugin.sh
new file mode 100755
index 00000000..c580b89a
--- /dev/null
+++ b/tests/test-multi-conn-plugin.sh
@@ -0,0 +1,121 @@
+#!/usr/bin/env bash
+# nbdkit
+# Copyright (C) 2018-2021 Red Hat Inc.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+#
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# * Neither the name of Red Hat nor the names of its contributors may be
+# used to endorse or promote products derived from this software without
+# specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+# SUCH DAMAGE.
+
+# Test plugin used by test-multi-conn.sh.
+# This plugin purposefully maintains a per-connection cache.
+# An optional parameter tightfua=true controls whether FUA acts on
+# just the given region, or on all pending ops in the current connection.
+# Note that an earlier cached write on one connection can overwrite a later
+# FUA write on another connection - this is okay (the client is buggy if
+# it ever sends overlapping writes without coordinating flushes and still
+# expects any particular write to occur last).
+
+fill_cache() {
+ if test ! -f "$tmpdir/$1"; then
+ cp "$tmpdir/0" "$tmpdir/$1"
+ fi
+}
+do_fua() {
+ case ,$4, in
+ *,fua,*)
+ if test -f "$tmpdir/strictfua"; then
+ dd of="$tmpdir/0" if="$tmpdir/$1" skip=$3 seek=$3
count=$2 \
+ conv=notrunc iflag=count_bytes,skip_bytes oflag=seek_bytes
+ else
+ do_flush $1
+ fi ;;
+ esac
+}
+do_flush() {
+ if test -f "$tmpdir/$1-replay"; then
+ while read cnt off; do
+ dd of="$tmpdir/0" if="$tmpdir/$1" skip=$off seek=$off
count=$cnt \
+ conv=notrunc iflag=count_bytes,skip_bytes oflag=seek_bytes
+ done < "$tmpdir/$1-replay"
+ fi
+ rm -f "$tmpdir/$1" "$tmpdir/$1-replay"
+}
+case "$1" in
+ config)
+ case $2 in
+ strictfua)
+ case $3 in
+ true | on | 1) touch "$tmpdir/strictfua" ;;
+ false | off | 0) ;;
+ *) echo "unknown value for strictfua $3" >&2; exit 1
;;
+ esac ;;
+ *) echo "unknown config key $2" >&2; exit 1 ;;
+ esac
+ ;;
+ get_ready)
+ printf "%-32s" 'Initial contents' > "$tmpdir/0"
+ echo 0 > "$tmpdir/counter"
+ ;;
+ get_size)
+ echo 32
+ ;;
+ can_write | can_zero | can_trim | can_flush)
+ exit 0
+ ;;
+ can_fua | can_cache)
+ echo native
+ ;;
+ open)
+ read i < "$tmpdir/counter"
+ echo $((i+1)) | tee "$tmpdir/counter"
+ ;;
+ pread)
+ fill_cache $2
+ dd if="$tmpdir/$2" skip=$4 count=$3 iflag=count_bytes,skip_bytes
+ ;;
+ cache)
+ fill_cache $2
+ ;;
+ pwrite)
+ fill_cache $2
+ dd of="$tmpdir/$2" seek=$4 conv=notrunc oflag=seek_bytes
+ echo $3 $4 >> "$tmpdir/$2-replay"
+ do_fua $2 $3 $4 $5
+ ;;
+ zero | trim)
+ fill_cache $2
+ dd of="$tmpdir/$2" if="/dev/zero" seek=$4 conv=notrunc
oflag=seek_bytes
+ echo $3 $4 >> "$tmpdir/$2-replay"
+ do_fua $2 $3 $4 $5
+ ;;
+ flush)
+ do_flush $2
+ ;;
+ *)
+ exit 2
+ ;;
+esac
diff --git a/tests/test-multi-conn.sh b/tests/test-multi-conn.sh
new file mode 100755
index 00000000..01efd108
--- /dev/null
+++ b/tests/test-multi-conn.sh
@@ -0,0 +1,85 @@
+#!/usr/bin/env bash
+# nbdkit
+# Copyright (C) 2018-2021 Red Hat Inc.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+#
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# * Neither the name of Red Hat nor the names of its contributors may be
+# used to endorse or promote products derived from this software without
+# specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+# SUCH DAMAGE.
+
+# Demonstrate various multi-conn filter behaviors.
+
+source ./functions.sh
+set -e
+set -x
+
+requires_plugin sh
+requires nbdsh -u "nbd://nosuch" --version
+requires dd iflag=count_bytes </dev/null
+
+files="test-multi-conn.out test-multi-conn.stat"
+rm -f $files
+cleanup_fn rm -f $files
+
+fail=0
+export handles preamble handles uri
+uri= # will be set by --run later
+handles=2
+preamble='
+import os
+
+uri = os.environ["uri"]
+handles = int(os.environ["handles"])
+h = []
+for i in range(handles):
+ h.append(nbd.NBD())
+ h[i].connect_uri(uri)
+print(h[0].can_multi_conn())
+'
+
+# Demonstrate the caching required with plugin alone
+nbdkit -vf -U - sh test-multi-conn-plugin.sh \
+ --run 'nbdsh -c "$preamble" -c "
+"' > test-multi-conn.out || fail=1
+diff -u <(cat <<\EOF
+False
+EOF
+ ) test-multi-conn.out || fail=1
+
+# Demonstrate that FUA alone does not have to sync full disk
+ # TODO
+# Demonstrate multi-conn defaults
+ # TODO
+# Use --filter=stats to show track-dirty effects
+nbdkit -vf -U - sh test-multi-conn-plugin.sh \
+ --filter=stats statsfile=test-multi-conn.stat \
+ --run 'nbdsh -c "$preamble" -c "
+h[0].flush()
+"' > test-multi-conn.out || fail=1
+cat test-multi-conn.stat
+grep 'flush: 1 ops' test-multi-conn.stat || fail=1
+
+exit $fail
diff --git a/TODO b/TODO
index d8dd7ef2..e41e38e8 100644
--- a/TODO
+++ b/TODO
@@ -206,13 +206,6 @@ Suggestions for filters
* masking plugin features for testing clients (see 'nozero' and 'fua'
filters for examples)
-* multi-conn filter to adjust advertisement of multi-conn bit. In
- particular, if the plugin lacks .can_multi_conn, then .open/.close
- track all open connections, and .flush and FUA flag will call
- next_ops->flush() on all of them. Conversely, if plugin supports
- multi-conn, we can cache whether the image is dirty, and avoid
- expense of next_ops->flush when it is clean.
-
* "bandwidth quota" filter which would close a connection after it
exceeded a certain amount of bandwidth up or down.
--
2.30.1