The upstream NBD protocol recently clarified that servers can
advertise block size limitations to clients that ask with
NBD_OPT_GO (although we're still a ways off from implementing
that in nbdkit); and that in the absence of that, then clients
should agree on limits using out-of-band information or stick
to sane defaults (everything 512-byte-aligned, no reads or
writes larger than 32M). But the protocol is not inherently
prevented from serving 1-byte requests, nor does it prohibit
a request of nearly 4G on bulk actions like trim and zero; and
nbdkit itself supports 1-byte requests, read and write up to
64M, and no limit on trim or zero, even though these requests
may fail miserably on some plugins.
So, nbdkit should make it easier to plug together clients and
servers that have different notions of blocksize limitations.
A new blocksize filter makes it possible to specify a minimum
blocksize handed to the plugin (everything smaller is rounded
out, using read-modify-write as needed), and maximum limits
(maxdata for read/write since they have a buffer, and maxlen
for zero/trim since they do not).
Testing is easy by reusing the log filter and observing that
requests were rewritten as expected (well, with the slight
complication that different versions of qemu-io vary in whether
they obey the NBD spec of limiting themselves to 512-byte I/O
if we don't advertise otherwise with NBD_OPT_GO, or in just
directly handing us byte-based I/O anyway - so the test has
to round to something even bigger to be sure it sees a
difference).
Signed-off-by: Eric Blake <eblake(a)redhat.com>
---
v3: rebase to API changes; get unit test working; use (fixed)
nbdkit_parse_size
---
TODO | 5 -
docs/nbdkit-filter.pod | 1 +
docs/nbdkit.pod | 1 +
filters/blocksize/nbdkit-blocksize-filter.pod | 141 +++++++++++
configure.ac | 3 +-
filters/blocksize/blocksize.c | 350 ++++++++++++++++++++++++++
filters/Makefile.am | 1 +
filters/blocksize/Makefile.am | 62 +++++
tests/Makefile.am | 4 +
tests/test-blocksize.sh | 156 ++++++++++++
10 files changed, 718 insertions(+), 6 deletions(-)
create mode 100644 filters/blocksize/nbdkit-blocksize-filter.pod
create mode 100644 filters/blocksize/blocksize.c
create mode 100644 filters/blocksize/Makefile.am
create mode 100755 tests/test-blocksize.sh
diff --git a/TODO b/TODO
index a691ff3..d02671d 100644
--- a/TODO
+++ b/TODO
@@ -86,11 +86,6 @@ Suggestions for filters
unneeded intermediate flushing; hence, where this filter is placed
in the stack may have a performance impact.
-* blocksize filter: setting minblock performs read-modify-write of
- requests that are too small or unaligned for the plugin; setting
- maxdata breaks up too-large read/write; setting maxlen breaks up
- too-large trim/zero
-
nbdkit-cache-filter needs considerable work:
* allow the user to limit the total size of the cache
diff --git a/docs/nbdkit-filter.pod b/docs/nbdkit-filter.pod
index 3ec8f2a..32d50cb 100644
--- a/docs/nbdkit-filter.pod
+++ b/docs/nbdkit-filter.pod
@@ -560,6 +560,7 @@ L<nbdkit-plugin(1)>.
Filters:
+L<nbdkit-blocksize-filter(1)>,
L<nbdkit-cache-filter(1)>,
L<nbdkit-cow-filter(1)>,
L<nbdkit-delay-filter(1)>,
diff --git a/docs/nbdkit.pod b/docs/nbdkit.pod
index 22d91e7..94ddb62 100644
--- a/docs/nbdkit.pod
+++ b/docs/nbdkit.pod
@@ -917,6 +917,7 @@ L<nbdkit-xz-plugin(1)>.
Filters:
+L<nbdkit-blocksize-filter(1)>,
L<nbdkit-cache-filter(1)>,
L<nbdkit-cow-filter(1)>,
L<nbdkit-delay-filter(1)>,
diff --git a/filters/blocksize/nbdkit-blocksize-filter.pod
b/filters/blocksize/nbdkit-blocksize-filter.pod
new file mode 100644
index 0000000..39a2ffc
--- /dev/null
+++ b/filters/blocksize/nbdkit-blocksize-filter.pod
@@ -0,0 +1,141 @@
+=encoding utf8
+
+=head1 NAME
+
+nbdkit-blocksize-filter - nbdkit blocksize filter
+
+=head1 SYNOPSIS
+
+ nbdkit --filter=blocksize plugin [minblock=SIZE] [maxdata=SIZE] \
+ [maxlen=SIZE] [plugin-args...]
+
+=head1 DESCRIPTION
+
+C<nbdkit-blocksize-filter> is a filter that ensures various block size
+limits are met on transactions presented to the plugin. The NBD
+protocol permits clients to send requests with a granularity as small
+as 1 byte or as large as nearly 4 gigabytes, although it suggests that
+portable clients should align requests to 512 bytes and not exceed 32
+megabytes without prior coordination with the server.
+
+Meanwhile, some plugins require requests to be aligned to 512-byte
+multiples, or may enforce a maximum transaction size to bound the time
+or memory resources spent by any one command (note that nbdkit itself
+refuses a read or write larger than 64 megabytes, while many other NBD
+servers limit things to 32 megabytes). The blocksize filter can be
+used to modify the client requests to meet the plugin restrictions.
+
+=head1 PARAMETERS
+
+The nbdkit-blocksize-filter accepts the following parameters.
+
+=over 4
+
+=item B<minblock=SIZE>
+
+The minimum block size and alignment to pass to the plugin. This must
+be a power of two, and no larger than 64k. If omitted, this defaults
+to 1 (that is, no minimum size restrictions). The filter rounds up
+read requests to alignment boundaries, performs read-modify-write
+cycles for any unaligned head or tail of a write or zero request, and
+silently ignores any unaligned head or tail of a trim request. The
+filter also truncates the plugin size down to an aligned value (as it
+cannot safely operate on the unaligned tail); it is an error if this
+would result in a size of 0.
+
+This parameter understands the suffix 'k' for 1024.
+
+=item B<maxdata=SIZE>
+
+The maximum block size for any single transaction with data (read and
+write). If omitted, this defaults to 64 megabytes (that is, the
+nbdkit maximum). This need not be a power of two, but must be an
+integer multiple of C<minblock>. The filter fragments any larger
+client request into multiple plugin requests.
+
+This parameter understands the suffixes 'k', 'M', and 'G' for
powers
+of 1024.
+
+=item B<maxlen=SIZE>
+
+The maximum length for any single transaction without data (trim and
+zero). If omitted, this defaults to 0xffffffff rounded down to
+C<minsize> alignment (that is, the inherent 32-bit limit of the NBD
+protocol). This need not be a power of two, but must be an integer
+multiple of C<minblock>, and should be at least as large as
+C<maxdata>. The filter fragments any larger client request into
+multiple plugin requests.
+
+This parameter understands the suffixes 'k', 'M', and 'G' for
powers
+of 1024.
+
+=back
+
+=head1 EXAMPLES
+
+Allow an arbitrary client to use the VDDK plugin (which is limited to
+512-byte blocks), without having to fix the client to avoid sending
+unaligned requests:
+
+ nbdkit --filter=blocksize vddk minblock=512 file=/absolute/path/to/file.vmdk
+
+Allow an arbitrary client tuned to nbdkit's 64 megabyte sizing to
+connect to a remote server that insists on 32 megabyte sizing, via the
+nbd plugin:
+
+ nbdkit --filter=blocksize nbd maxdata=32M socket=/path/to/socket
+
+=head1 SEE ALSO
+
+L<nbdkit(1)>,
+L<nbdkit-nbd-plugin(1)>,
+L<nbdkit-vddk-plugin(1)>,
+L<nbdkit-filter(3)>.
+
+=head1 AUTHORS
+
+Eric Blake
+
+=head1 COPYRIGHT
+
+Copyright (C) 2018 Red Hat Inc.
+
+=head1 LICENSE
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+=over 4
+
+=item *
+
+Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+
+=item *
+
+Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+=item *
+
+Neither the name of Red Hat nor the names of its contributors may be
+used to endorse or promote products derived from this software without
+specific prior written permission.
+
+=back
+
+THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+SUCH DAMAGE.
diff --git a/configure.ac b/configure.ac
index 3fcc776..c3a121d 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1,5 +1,5 @@
# nbdkit
-# Copyright (C) 2013-2017 Red Hat Inc.
+# Copyright (C) 2013-2018 Red Hat Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
@@ -513,6 +513,7 @@ AC_CONFIG_FILES([Makefile
plugins/vddk/Makefile
plugins/xz/Makefile
filters/Makefile
+ filters/blocksize/Makefile
filters/cache/Makefile
filters/cow/Makefile
filters/delay/Makefile
diff --git a/filters/blocksize/blocksize.c b/filters/blocksize/blocksize.c
new file mode 100644
index 0000000..8834457
--- /dev/null
+++ b/filters/blocksize/blocksize.c
@@ -0,0 +1,350 @@
+/* nbdkit
+ * Copyright (C) 2018 Red Hat Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ *
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ *
+ * * Neither the name of Red Hat nor the names of its contributors may be
+ * used to endorse or promote products derived from this software without
+ * specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+ * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+ * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+ * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+ * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+ * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include <config.h>
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <string.h>
+#include <inttypes.h>
+#include <limits.h>
+#include <errno.h>
+
+#include <nbdkit-filter.h>
+
+/* XXX See design comment in filters/cow/cow.c. */
+#define THREAD_MODEL NBDKIT_THREAD_MODEL_SERIALIZE_ALL_REQUESTS
+
+#define BLOCKSIZE_MIN_LIMIT (64U * 1024)
+#define MIN(a, b) ((a) < (b) ? (a) : (b))
+
+/* As long as we don't have parallel requests, we can reuse a common
+ * buffer for alignment purposes; size it to the maximum we allow for
+ * minblock. */
+static char bounce[BLOCKSIZE_MIN_LIMIT];
+static unsigned int minblock;
+static unsigned int maxdata;
+static unsigned int maxlen;
+
+static int
+blocksize_parse (const char *name, const char *s, unsigned int *v)
+{
+ int64_t size = nbdkit_parse_size (s);
+
+ if (UINT_MAX < size) {
+ nbdkit_error ("parameter '%s' too large", name);
+ return -1;
+ }
+ *v = size;
+ return 0;
+}
+
+/* Called for each key=value passed on the command line. */
+static int
+blocksize_config (nbdkit_next_config *next, void *nxdata,
+ const char *key, const char *value)
+{
+
+ if (strcmp (key, "minblock") == 0)
+ return blocksize_parse (key, value, &minblock);
+ if (strcmp (key, "maxdata") == 0)
+ return blocksize_parse (key, value, &maxdata);
+ if (strcmp (key, "maxlen") == 0)
+ return blocksize_parse (key, value, &maxlen);
+ return next (nxdata, key, value);
+}
+
+/* Check that limits are sane. */
+static int
+blocksize_config_complete (nbdkit_next_config_complete *next, void *nxdata)
+{
+ if (minblock) {
+ if (minblock & (minblock - 1)) {
+ nbdkit_error ("minblock must be a power of 2");
+ return -1;
+ }
+ if (minblock > BLOCKSIZE_MIN_LIMIT) {
+ nbdkit_error ("minblock must not exceed %u", BLOCKSIZE_MIN_LIMIT);
+ return -1;
+ }
+ }
+ else
+ minblock = 1;
+
+ if (maxdata) {
+ if (maxdata & (minblock - 1)) {
+ nbdkit_error ("maxdata must be a multiple of %u", minblock);
+ return -1;
+ }
+ }
+ else
+ maxdata = 64 * 1024 * 1024;
+
+ if (maxlen) {
+ if (maxlen & (minblock - 1)) {
+ nbdkit_error ("maxlen must be a multiple of %u", minblock);
+ return -1;
+ }
+ }
+ else
+ maxlen = -minblock;
+
+ return next (nxdata);
+}
+
+#define blocksize_config_help \
+ "minblock=<SIZE> Minimum block size, power of 2 <= 64k (default
1).\n" \
+ "maxdata=<SIZE> Maximum size for read/write (default 64M).\n" \
+ "maxlen=<SIZE> Maximum size for trim/zero (default
4G-minblock)."
+
+static int
+blocksize_prepare (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle)
+{
+ /* Early call to get_size to ensure it doesn't truncate to 0. */
+ int64_t size = next_ops->get_size (nxdata);
+
+ if (size == -1)
+ return -1;
+ if (size < minblock) {
+ nbdkit_error ("disk is too small for minblock size %u", minblock);
+ return -1;
+ }
+ return 0;
+}
+
+static int64_t
+blocksize_get_size (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle)
+{
+ int64_t size = next_ops->get_size (nxdata);
+
+ return size == -1 ? size : size & ~(minblock - 1);
+}
+
+static int
+blocksize_pread (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle, void *b, uint32_t count, uint64_t offs,
+ uint32_t flags, int *err)
+{
+ char *buf = b;
+ uint32_t keep;
+ uint32_t drop;
+
+ /* Unaligned head */
+ if (offs & (minblock - 1)) {
+ drop = offs & (minblock - 1);
+ keep = MIN (minblock - drop, count);
+ if (next_ops->pread (nxdata, bounce, minblock, offs - drop, flags,
+ err) == -1)
+ return -1;
+ memcpy (buf, bounce + drop, keep);
+ buf += keep;
+ offs += keep;
+ count -= keep;
+ }
+
+ /* Unaligned tail */
+ if (count & (minblock - 1)) {
+ keep = count & (minblock - 1);
+ count -= keep;
+ if (next_ops->pread (nxdata, bounce, minblock, offs + count, flags,
+ err) == -1)
+ return -1;
+ memcpy (buf + count, bounce, keep);
+ }
+
+ /* Aligned body */
+ while (count) {
+ keep = MIN (maxdata, count);
+ if (next_ops->pread (nxdata, buf, keep, offs, flags, err) == -1)
+ return -1;
+ buf += keep;
+ offs += keep;
+ count -= keep;
+ }
+
+ return 0;
+}
+
+static int
+blocksize_pwrite (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle, const void *b, uint32_t count, uint64_t offs,
+ uint32_t flags, int *err)
+{
+ const char *buf = b;
+ uint32_t keep;
+ uint32_t drop;
+
+ /* FIXME: Smarter handling of FUA - pass it through if the next layer
+ * can handle it natively, but just once at end if next layer emulates. */
+
+ /* Unaligned head */
+ if (offs & (minblock - 1)) {
+ drop = offs & (minblock - 1);
+ keep = MIN (minblock - drop, count);
+ if (next_ops->pread (nxdata, bounce, minblock, offs - drop, 0, err) == -1)
+ return -1;
+ memcpy (bounce + drop, buf, keep);
+ if (next_ops->pwrite (nxdata, bounce, minblock, offs - drop, flags,
+ err) == -1)
+ return -1;
+ buf += keep;
+ offs += keep;
+ count -= keep;
+ }
+
+ /* Unaligned tail */
+ if (count & (minblock - 1)) {
+ keep = count & (minblock - 1);
+ count -= keep;
+ if (next_ops->pread (nxdata, bounce, minblock, offs + count, 0, err) == -1)
+ return -1;
+ memcpy (bounce, buf + count, keep);
+ if (next_ops->pwrite (nxdata, bounce, minblock, offs + count, flags,
+ err) == -1)
+ return -1;
+ }
+
+ /* Aligned body */
+ while (count) {
+ keep = MIN (maxdata, count);
+ if (next_ops->pwrite (nxdata, buf, keep, offs, flags, err) == -1)
+ return -1;
+ buf += keep;
+ offs += keep;
+ count -= keep;
+ }
+
+ return 0;
+}
+
+static int
+blocksize_trim (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle, uint32_t count, uint64_t offs, uint32_t flags,
+ int *err)
+{
+ uint32_t keep;
+
+ /* FIXME: Smarter handling of FUA - pass it through if the next layer
+ * can handle it natively, but just once at end if next layer emulates. */
+
+ /* Unaligned head */
+ if (offs & (minblock - 1)) {
+ keep = MIN (minblock - (offs & (minblock - 1)), count);
+ offs += keep;
+ count -= keep;
+ }
+
+ /* Unaligned tail */
+ if (count & (minblock - 1))
+ count -= count & (minblock - 1);
+
+ /* Aligned body */
+ while (count) {
+ keep = MIN (maxlen, count);
+ if (next_ops->trim (nxdata, keep, offs, flags, err) == -1)
+ return -1;
+ offs += keep;
+ count -= keep;
+ }
+
+ return 0;
+}
+
+static int
+blocksize_zero (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle, uint32_t count, uint64_t offs, uint32_t flags,
+ int *err)
+{
+ uint32_t keep;
+ uint32_t drop;
+
+ /* FIXME: Smarter handling of FUA - pass it through if the next layer
+ * can handle it natively, but just once at end if next layer emulates. */
+
+ /* Unaligned head */
+ if (offs & (minblock - 1)) {
+ drop = offs & (minblock - 1);
+ keep = MIN (minblock - drop, count);
+ if (next_ops->pread (nxdata, bounce, minblock, offs - drop, 0, err) == -1)
+ return -1;
+ memset (bounce + drop, 0, keep);
+ if (next_ops->pwrite (nxdata, bounce, minblock, offs - drop,
+ flags & ~NBDKIT_FLAG_MAY_TRIM, err) == -1)
+ return -1;
+ offs += keep;
+ count -= keep;
+ }
+
+ /* Unaligned tail */
+ if (count & (minblock - 1)) {
+ keep = count & (minblock - 1);
+ count -= keep;
+ if (next_ops->pread (nxdata, bounce, minblock, offs + count, 0, err) == -1)
+ return -1;
+ memset (bounce, 0, keep);
+ if (next_ops->pwrite (nxdata, bounce, minblock, offs + count,
+ flags & ~NBDKIT_FLAG_MAY_TRIM, err) == -1)
+ return -1;
+ }
+
+ /* Aligned body */
+ while (count) {
+ keep = MIN (maxlen, count);
+ if (next_ops->zero (nxdata, keep, offs, flags, err) == -1)
+ return -1;
+ offs += keep;
+ count -= keep;
+ }
+
+ return 0;
+}
+
+static struct nbdkit_filter filter = {
+ .name = "blocksize",
+ .longname = "nbdkit blocksize filter",
+ .version = PACKAGE_VERSION,
+ .config = blocksize_config,
+ .config_complete = blocksize_config_complete,
+ .config_help = blocksize_config_help,
+ .prepare = blocksize_prepare,
+ .get_size = blocksize_get_size,
+ .pread = blocksize_pread,
+ .pwrite = blocksize_pwrite,
+ .trim = blocksize_trim,
+ .zero = blocksize_zero,
+};
+
+NBDKIT_REGISTER_FILTER(filter)
diff --git a/filters/Makefile.am b/filters/Makefile.am
index 8e070e5..de98f43 100644
--- a/filters/Makefile.am
+++ b/filters/Makefile.am
@@ -31,6 +31,7 @@
# SUCH DAMAGE.
SUBDIRS = \
+ blocksize \
cache \
cow \
delay \
diff --git a/filters/blocksize/Makefile.am b/filters/blocksize/Makefile.am
new file mode 100644
index 0000000..0069403
--- /dev/null
+++ b/filters/blocksize/Makefile.am
@@ -0,0 +1,62 @@
+# nbdkit
+# Copyright (C) 2018 Red Hat Inc.
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+#
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# * Neither the name of Red Hat nor the names of its contributors may be
+# used to endorse or promote products derived from this software without
+# specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+# SUCH DAMAGE.
+
+EXTRA_DIST = nbdkit-blocksize-filter.pod
+
+CLEANFILES = *~
+
+filterdir = $(libdir)/nbdkit/filters
+
+filter_LTLIBRARIES = nbdkit-blocksize-filter.la
+
+nbdkit_blocksize_filter_la_SOURCES = \
+ blocksize.c \
+ $(top_srcdir)/include/nbdkit-filter.h
+
+nbdkit_blocksize_filter_la_CPPFLAGS = \
+ -I$(top_srcdir)/include
+nbdkit_blocksize_filter_la_CFLAGS = \
+ $(WARNINGS_CFLAGS)
+nbdkit_blocksize_filter_la_LDFLAGS = \
+ -module -avoid-version -shared
+
+if HAVE_POD2MAN
+
+man_MANS = nbdkit-blocksize-filter.1
+CLEANFILES += $(man_MANS)
+
+nbdkit-blocksize-filter.1: nbdkit-blocksize-filter.pod
+ $(POD2MAN) $(POD2MAN_ARGS) --section=1 --name=`basename $@ .1` $< $@.t && \
+ if grep 'POD ERROR' $@.t; then rm $@.t; exit 1; fi && \
+ mv $@.t $@
+
+endif
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 2d6393d..2b3082e 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -40,6 +40,7 @@ EXTRA_DIST = \
shebang.pl \
shebang.py \
shebang.rb \
+ test-blocksize.sh \
test-cache.sh \
test-captive.sh \
test-cow.sh \
@@ -414,6 +415,9 @@ endif HAVE_RUBY
#----------------------------------------------------------------------
# Tests of filters.
+# blocksize filter test.
+TESTS += test-blocksize.sh
+
# cache filter test.
TESTS += test-cache.sh
diff --git a/tests/test-blocksize.sh b/tests/test-blocksize.sh
new file mode 100755
index 0000000..2a78e9c
--- /dev/null
+++ b/tests/test-blocksize.sh
@@ -0,0 +1,156 @@
+#!/bin/bash -
+# nbdkit
+# Copyright (C) 2018 Red Hat Inc.
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+#
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# * Neither the name of Red Hat nor the names of its contributors may be
+# used to endorse or promote products derived from this software without
+# specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+# SUCH DAMAGE.
+
+set -e
+
+files="blocksize1.img blocksize1.log blocksize1.sock blocksize1.pid
+ blocksize2.img blocksize2.log blocksize2.sock blocksize2.pid"
+rm -f $files
+
+: ${QEMU_IO=qemu-io}
+
+# Prep images, and check that qemu-io understands the actions we plan on doing.
+# TODO: Until we implement NBD_OPT_GO, qemu-io does its own read-modify-write
+# at 512-byte alignment, while we'd like to ultimately test 1-byte accesses
+truncate --size 10M blocksize1.img
+if ! $QEMU_IO -f raw -c 'r 0 1' -c 'w -z 1000 2000' \
+ -c 'w -P 0 1M 2M' -c 'discard 3M 4M' blocksize1.img; then
+ echo "$0: missing or broken qemu-io"
+ rm blocksize1.img
+ exit 77
+fi
+truncate --size 10M blocksize2.img
+
+pid1= pid2=
+
+# Kill any nbdkit processes on exit.
+cleanup ()
+{
+ status=$?
+
+ test "$pid1" && kill $pid1
+ test "$pid2" && kill $pid2
+ # For easier debugging, dump the final log files before removing them.
+ echo "Log 1 file contents:"
+ cat blocksize1.log || :
+ echo "Log 2 file contents:"
+ cat blocksize2.log || :
+ rm -f $files
+
+ exit $status
+}
+trap cleanup INT QUIT TERM EXIT ERR
+
+# Run two parallel nbdkit; to compare the logs and see what changes.
+nbdkit -P blocksize1.pid -U blocksize1.sock \
+ --filter=log file logfile=blocksize1.log file=blocksize1.img
+nbdkit -P blocksize2.pid -U blocksize2.sock --filter=blocksize \
+ --filter=log file logfile=blocksize2.log file=blocksize2.img \
+ minblock=1024 maxdata=512k maxlen=1M
+
+# We may have to wait a short time for the pid files to appear.
+for i in `seq 1 10`; do
+ if test -f blocksize1.pid && test -f blocksize2.pid; then
+ break
+ fi
+ sleep 1
+done
+
+pid1="$(cat blocksize1.pid)" || :
+pid2="$(cat blocksize2.pid)" || :
+
+if ! test -f blocksize1.pid || ! test -f blocksize2.pid; then
+ echo "$0: PID files were not created"
+ exit 1
+fi
+
+# Test behavior on short accesses.
+$QEMU_IO -f raw -c 'r 1 1' -c 'w 10001 1' -c 'w -z 20001 1' \
+ -c 'discard 30001 1' 'nbd+unix://?socket=blocksize1.sock'
+$QEMU_IO -f raw -c 'r 1 1' -c 'w 10001 1' -c 'w -z 20001 1' \
+ -c 'discard 30001 1' 'nbd+unix://?socket=blocksize2.sock'
+
+# Read should round up (qemu-io may round to 512, but we must round to 1024
+grep 'connection=1 Read .* count=0x\(1\|200\) ' blocksize1.log ||
+ { echo "qemu-io can't pass 1-byte reads"; exit 77; }
+grep 'connection=1 Read .* offset=0x0 count=0x400 ' blocksize2.log
+# Write should become read-modify-write
+grep 'connection=1 Write .* count=0x\(1\|200\) ' blocksize1.log ||
+ { echo "qemu-io can't pass 1-byte writes"; exit 77; }
+grep 'connection=1 Read .* offset=0x2400 count=0x400 ' blocksize2.log
+grep 'connection=1 Write .* offset=0x2400 count=0x400 ' blocksize2.log
+# Zero should become read-modify-write
+if grep 'connection=1 Zero' blocksize2.log; then
+ echo "filter should have converted short zero to write"
+ exit 1
+fi
+grep 'connection=1 Read .* offset=0x4c00 count=0x400 ' blocksize2.log
+grep 'connection=1 Write .* offset=0x4c00 count=0x400 ' blocksize2.log
+# Trim should be discarded
+if grep 'connection=1 Trim' blocksize2.log; then
+ echo "filter should have dropped too-small trim"
+ exit 1
+fi
+
+# Test behavior on overlarge accesses.
+$QEMU_IO -f raw -c 'w -P 11 1048575 4094305' -c 'w -z 1050000 1100000' \
+ -c 'r -P 0 1050000 1100000' -c 'r -P 11 3000000 1048577' \
+ -c 'discard 7340031 2097153'
'nbd+unix://?socket=blocksize1.sock'
+$QEMU_IO -f raw -c 'w -P 11 1048575 4094305' -c 'w -z 1050000 1100000' \
+ -c 'r -P 0 1050000 1100000' -c 'r -P 11 3000000 1048577' \
+ -c 'discard 7340031 2097153'
'nbd+unix://?socket=blocksize2.sock'
+
+# Reads and writes should have been split.
+test "$(grep -c '\(Read\|Write\) .*count=0x80000 ' blocksize2.log)" -ge
10
+test "$(grep -c '\(Read\|Write\) .*count=0x[0-9a-f]\{6\} '
blocksize2.log)" = 0
+# Zero and trim should be split, but at different boundary
+grep 'Zero .*count=0x100000 ' blocksize2.log
+test "$(grep -c 'connection=2 Zero' blocksize2.log)" = 2
+if grep Trim blocksize1.log; then
+ test "$(grep -c 'connection=2 Trim .*count=0x100000 '
blocksize2.log)" = 2
+fi
+
+# Final sanity checks.
+if grep 'offset=0x[0-9a-f]*\([1235679abdef]00\|[0-9a-f]\(.[^0]\|[^0].\)\) ' \
+ blocksize2.log; then
+ echo "filter didn't align offset to 1024";
+ exit 1;
+fi
+if grep 'count=0x[0-9a-f]*\([1235679abdef]00\|[0-9a-f]\(.[^0]\|[^0].\)\) ' \
+ blocksize2.log; then
+ echo"filter didn't align count to 512";
+ exit 1;
+fi
+diff -u blocksize1.img blocksize2.img
+
+# The cleanup() function is called implicitly on exit.
--
2.14.3