[nbdkit PATCH v3 0/5] Play with libnbd for nbdkit-nbd
by Eric Blake
libnbd-0.1.4-1 is now available in Fedora 29/30 updates testing.
Diffs since v2 - rebase to master, bump from libnbd 0.1.2 to 0.1.3+,
add tests to TLS usage which flushed out the need to turn relative
pathnames into absolute, doc tweaks
Now that the testsuite covers TLS and libnbd has been fixed to provide
the things I found lacking when developing v2, I'm leaning towards
pushing this on Monday, or sooner if I get a positive review.
Eric Blake (5):
nbd: Check for libnbd
nbd: s/nbd_/nbdplug_/
nbd: Use libnbd 0.1.3+
nbd: Add magic uri=... parameter
nbd: Add TLS client support
plugins/nbd/nbdkit-nbd-plugin.pod | 112 ++-
configure.ac | 18 +
plugins/nbd/nbd-standalone.c | 1369 +++++++++++++++++++++++++++++
plugins/nbd/nbd.c | 1364 +++++++++-------------------
TODO | 13 +-
plugins/nbd/Makefile.am | 25 +-
tests/Makefile.am | 6 +-
tests/test-nbd-tls-psk.sh | 81 ++
tests/test-nbd-tls.sh | 82 ++
tests/test-tls-psk.sh | 2 +-
tests/test-tls.sh | 4 +-
11 files changed, 2101 insertions(+), 975 deletions(-)
create mode 100644 plugins/nbd/nbd-standalone.c
create mode 100755 tests/test-nbd-tls-psk.sh
create mode 100755 tests/test-nbd-tls.sh
--
2.20.1
5 years, 5 months
[nbdkit PATCH v2] Introduce cacheextents filter
by Martin Kletzander
This filter caches the last result of the extents() call and offers a nice
speed-up for clients that only support req_one=1 in combination with plugins
like vddk, which has no overhead for returning information for multiple extents
in one call, but that call is very time-consuming.
Quick test showed that on a fast connection and a sparsely allocated 16G disk
with a OS installed `qemu-img map` runs 16s instead of 33s (out of which it
takes 5s to the first extents request). For 100G disk with no data on it, that
is one hole extent spanning the whole disk (where there is no space for
improvement) this does not add any noticeable overhead.
Signed-off-by: Martin Kletzander <mkletzan(a)redhat.com>
---
Notes:
v2:
- Clearing the REQ_ONE when forwarding the request to plugin
- I hate the wording of the comment explaining it, please suggest a reword (or
just just fix it)
- Tests are now using custom shell plugin logging all accesses and also qemu-io
testing all other expected behaviour
configure.ac | 2 +
filters/cacheextents/Makefile.am | 64 ++++++
filters/cacheextents/cacheextents.c | 196 ++++++++++++++++++
.../nbdkit-cacheextents-filter.pod | 47 +++++
tests/Makefile.am | 4 +
tests/test-cacheextents.sh | 110 ++++++++++
6 files changed, 423 insertions(+)
create mode 100644 filters/cacheextents/Makefile.am
create mode 100644 filters/cacheextents/cacheextents.c
create mode 100644 filters/cacheextents/nbdkit-cacheextents-filter.pod
create mode 100755 tests/test-cacheextents.sh
diff --git a/configure.ac b/configure.ac
index 58031f3c6d86..8db5a4cac3c5 100644
--- a/configure.ac
+++ b/configure.ac
@@ -825,6 +825,7 @@ plugins="$(echo $(printf %s\\n $lang_plugins $non_lang_plugins | sort -u))"
filters="\
blocksize \
cache \
+ cacheextents \
cow \
delay \
error \
@@ -900,6 +901,7 @@ AC_CONFIG_FILES([Makefile
filters/Makefile
filters/blocksize/Makefile
filters/cache/Makefile
+ filters/cacheextents/Makefile
filters/cow/Makefile
filters/delay/Makefile
filters/error/Makefile
diff --git a/filters/cacheextents/Makefile.am b/filters/cacheextents/Makefile.am
new file mode 100644
index 000000000000..9e5b59c09c59
--- /dev/null
+++ b/filters/cacheextents/Makefile.am
@@ -0,0 +1,64 @@
+# nbdkit
+# Copyright (C) 2019 Red Hat Inc.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+#
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# * Neither the name of Red Hat nor the names of its contributors may be
+# used to endorse or promote products derived from this software without
+# specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+# SUCH DAMAGE.
+
+include $(top_srcdir)/common-rules.mk
+
+EXTRA_DIST = nbdkit-cacheextents-filter.pod
+
+filter_LTLIBRARIES = nbdkit-cacheextents-filter.la
+
+nbdkit_cacheextents_filter_la_SOURCES = \
+ cacheextents.c \
+ $(top_srcdir)/include/nbdkit-filter.h
+
+nbdkit_cacheextents_filter_la_CPPFLAGS = \
+ -I$(top_srcdir)/include \
+ -I$(top_srcdir)/common/include \
+ -I$(top_srcdir)/common/utils
+nbdkit_cacheextents_filter_la_CFLAGS = \
+ $(WARNINGS_CFLAGS)
+nbdkit_cacheextents_filter_la_LDFLAGS = \
+ -module -avoid-version -shared \
+ -Wl,--version-script=$(top_srcdir)/filters/filters.syms
+nbdkit_cacheextents_filter_la_LIBADD = \
+ $(top_builddir)/common/utils/libutils.la
+
+if HAVE_POD
+
+man_MANS = nbdkit-cacheextents-filter.1
+CLEANFILES += $(man_MANS)
+
+nbdkit-cacheextents-filter.1: nbdkit-cacheextents-filter.pod
+ $(PODWRAPPER) --section=1 --man $@ \
+ --html $(top_builddir)/html/$@.html \
+ $<
+
+endif HAVE_POD
diff --git a/filters/cacheextents/cacheextents.c b/filters/cacheextents/cacheextents.c
new file mode 100644
index 000000000000..57c29a8a8c6f
--- /dev/null
+++ b/filters/cacheextents/cacheextents.c
@@ -0,0 +1,196 @@
+/* nbdkit
+ * Copyright (C) 2019 Red Hat Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ *
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ *
+ * * Neither the name of Red Hat nor the names of its contributors may be
+ * used to endorse or promote products derived from this software without
+ * specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+ * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+ * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+ * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+ * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+ * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include <config.h>
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <string.h>
+#include <errno.h>
+#include <inttypes.h>
+
+#include <pthread.h>
+
+#include <nbdkit-filter.h>
+
+#include "cleanup.h"
+
+#define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL
+
+/* This lock protects the global state. */
+static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
+
+/* Cached extents from the last extents () call and its start and end for the
+ sake of simplicity. */
+struct nbdkit_extents *cache_extents;
+static uint64_t cache_start;
+static uint64_t cache_end;
+
+static void
+cacheextents_unload (void)
+{
+ nbdkit_extents_free (cache_extents);
+}
+
+static int
+cacheextents_add (struct nbdkit_extents *extents, int *err)
+{
+ size_t i = 0;
+
+ for (i = 0; i < nbdkit_extents_count (cache_extents); i++) {
+ struct nbdkit_extent ex = nbdkit_get_extent (cache_extents, i);
+ if (nbdkit_add_extent (extents, ex.offset, ex.length, ex.type) == -1) {
+ *err = errno;
+ return -1;
+ }
+ }
+
+ return 0;
+}
+
+static int
+cacheextents_fill (struct nbdkit_extents *extents, int *err)
+{
+ size_t i = 0;
+ size_t count = nbdkit_extents_count (extents);
+ struct nbdkit_extent first = nbdkit_get_extent (extents, 0);
+ struct nbdkit_extent last = nbdkit_get_extent (extents, count - 1);
+
+ nbdkit_extents_free (cache_extents);
+ cache_start = first.offset;
+ cache_end = last.offset + last.length;
+ cache_extents = nbdkit_extents_new (cache_start, cache_end);
+
+ for (i = 0; i < count; i++) {
+ struct nbdkit_extent ex = nbdkit_get_extent (extents, i);
+ nbdkit_debug ("cacheextents: updating cache with"
+ ": offset=%" PRIu64
+ ": length=%" PRIu64
+ "; type=%x",
+ ex.offset, ex.length, ex.type);
+ if (nbdkit_add_extent (cache_extents, ex.offset, ex.length, ex.type) == -1) {
+ *err = errno;
+ return -1;
+ }
+ }
+
+ return 0;
+}
+
+static int
+cacheextents_extents (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle, uint32_t count, uint64_t offset, uint32_t flags,
+ struct nbdkit_extents *extents,
+ int *err)
+{
+ ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock);
+
+ nbdkit_debug ("cacheextents:"
+ " cache_start=%" PRIu64
+ " cache_end=%" PRIu64
+ " cache_extents=%p",
+ cache_start, cache_end, cache_extents);
+
+ if (cache_extents &&
+ offset >= cache_start && offset < cache_end) {
+ nbdkit_debug ("cacheextents: returning from cache");
+ return cacheextents_add (extents, err);
+ }
+
+ nbdkit_debug ("cacheextents: cache miss");
+ /* Clearing the REQ_ONE here in order to get (possibly) more data from the
+ * plugin. This should not increase the cost as plugins can still return just
+ * one extent if it's too costly to return more. */
+ flags &= ~(NBDKIT_FLAG_REQ_ONE);
+ if (next_ops->extents (nxdata, count, offset, flags, extents, err) == -1)
+ return -1;
+
+ return cacheextents_fill (extents, err);
+}
+
+/* Any changes to the data need to clean the cache.
+ *
+ * Similarly to readahead filter this could be more intelligent, but there would
+ * be very little benefit.
+ */
+
+static void
+kill_cacheextents (void)
+{
+ ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock);
+ nbdkit_extents_free (cache_extents);
+ cache_extents = NULL;
+}
+
+static int
+cacheextents_pwrite (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle,
+ const void *buf, uint32_t count, uint64_t offset,
+ uint32_t flags, int *err)
+{
+ kill_cacheextents ();
+ return next_ops->pwrite (nxdata, buf, count, offset, flags, err);
+}
+
+static int
+cacheextents_trim (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle,
+ uint32_t count, uint64_t offset, uint32_t flags,
+ int *err)
+{
+ kill_cacheextents ();
+ return next_ops->trim (nxdata, count, offset, flags, err);
+}
+
+static int
+cacheextents_zero (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle,
+ uint32_t count, uint64_t offset, uint32_t flags,
+ int *err)
+{
+ kill_cacheextents ();
+ return next_ops->zero (nxdata, count, offset, flags, err);
+}
+
+static struct nbdkit_filter filter = {
+ .name = "cacheextents",
+ .longname = "nbdkit cacheextents filter",
+ .version = PACKAGE_VERSION,
+ .unload = cacheextents_unload,
+ .pwrite = cacheextents_pwrite,
+ .trim = cacheextents_trim,
+ .zero = cacheextents_zero,
+ .extents = cacheextents_extents,
+};
+
+NBDKIT_REGISTER_FILTER (filter)
diff --git a/filters/cacheextents/nbdkit-cacheextents-filter.pod b/filters/cacheextents/nbdkit-cacheextents-filter.pod
new file mode 100644
index 000000000000..88f3173b9375
--- /dev/null
+++ b/filters/cacheextents/nbdkit-cacheextents-filter.pod
@@ -0,0 +1,47 @@
+=head1 NAME
+
+nbdkit-cacheextents-filter - cache extents
+
+=head1 SYNOPSIS
+
+ nbdkit --filter=cacheextents plugin
+
+=head1 DESCRIPTION
+
+C<nbdkit-cacheextents-filter> is a filter that caches the result of last
+extents() call.
+
+A common use for this filter is to accelerate returning extents data for
+clients which ask for extents with the REQ_ONE flag set (like
+S<C<qemu-img convert>>) and is especially useful in combination with
+plugins that report multiple extents in one call, but with high latency
+for each of those calls (like L<nbdkit-vddk-plugin(1)>). For example:
+
+ nbdkit -U - --filter=cacheextents --run 'qemu-img map $nbd' vddk ...
+
+For files with big extents (when it is unlikely for one extents() call
+to return multiple different extents) this does not slow down the
+access.
+
+=head1 PARAMETERS
+
+There are no parameters specific to nbdkit-cacheextents-filter. Any
+parameters are passed through to and processed by the underlying
+plugin in the normal way.
+
+=head1 SEE ALSO
+
+L<nbdkit(1)>,
+L<nbdkit-cache-filter(1)>,
+L<nbdkit-readahead-filter(1)>,
+L<nbdkit-vddk-plugin(1)>,
+L<nbdkit-filter(3)>,
+L<qemu-img(1)>.
+
+=head1 AUTHORS
+
+Martin Kletzander
+
+=head1 COPYRIGHT
+
+Copyright (C) 2019 Red Hat Inc.
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 4148793c7297..aaf74502783f 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -47,6 +47,7 @@ EXTRA_DIST = \
test-cache.sh \
test-cache-max-size.sh \
test-cache-on-read.sh \
+ test-cacheextents.sh \
test-captive.sh \
test-cow.sh \
test-cxx.sh \
@@ -865,6 +866,9 @@ TESTS += \
endif HAVE_GUESTFISH
TESTS += test-cache-max-size.sh
+# cacheextents filter test.
+TESTS += test-cacheextents.sh
+
# cow filter test.
if HAVE_GUESTFISH
TESTS += test-cow.sh
diff --git a/tests/test-cacheextents.sh b/tests/test-cacheextents.sh
new file mode 100755
index 000000000000..5b0667ad2b95
--- /dev/null
+++ b/tests/test-cacheextents.sh
@@ -0,0 +1,110 @@
+#!/usr/bin/env bash
+# nbdkit
+# Copyright (C) 2019 Red Hat Inc.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+#
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# * Neither the name of Red Hat nor the names of its contributors may be
+# used to endorse or promote products derived from this software without
+# specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+# SUCH DAMAGE.
+
+source ./functions.sh
+set -x
+set -e
+
+requires grep --version
+requires qemu-img --version
+requires qemu-io --version
+
+sock="$(mktemp -u)"
+sockurl="nbd+unix:///?socket=$sock"
+pidfile="test-cacheextents.pid"
+accessfile="test-cacheextents-access.log"
+accessfile_full="$(pwd)/test-cacheextents-access.log"
+files="$pidfile $sock"
+rm -f $files $accessfile
+cleanup_fn rm -f $files
+
+# Intentionally using EOF rather than 'EOF' so that we can pass in the
+# $accessfile_full
+start_nbdkit \
+ -P $pidfile \
+ -U $sock \
+ --filter=cacheextents \
+ sh - <<EOF
+echo "Call: \$@" >>$accessfile_full
+size=4M
+block_size=\$((1024*1024))
+case "\$1" in
+ get_size) echo \$size ;;
+ can_extents) ;;
+ extents)
+ echo "extents request: \$@" >>$accessfile_full
+ offset=\$((\$4/\$block_size))
+ count=\$((\$3/\$block_size))
+ length=\$((\$offset+\$count))
+ for i in \$(seq \$offset \$length); do
+ echo \${i}M \$block_size \$((i%4)) >>$accessfile_full
+ echo \${i}M \$block_size \$((i%4))
+ done
+ ;;
+ pread) dd if=/dev/zero count=\$3 iflag=count_bytes ;;
+ can_write) ;;
+ pwrite) dd of=/dev/null ;;
+ can_trim) ;;
+ trim) ;;
+ can_zero) ;;
+ trim) ;;
+ *) exit 2 ;;
+esac
+EOF
+
+
+test_me() {
+ num_accesses=$1
+ shift
+
+ qemu-io -f raw "$@" "$sockurl"
+ test "$(grep -c "^extents request: " $accessfile)" -eq "$num_accesses"
+ ret=$?
+ rm -f "$accessfile"
+ return $ret
+}
+
+# First one causes caching, the rest should be returned from cache.
+test_me 1 -c 'map' -c 'map' -c 'map'
+# First one is still cached from last time, discard should kill the cache, then
+# one request should go through.
+test_me 1 -c 'map' -c 'discard 0 1' -c 'map'
+# Same as above, only this time the cache is killed before all the operations as
+# well. This is used from now on to clear the cache as it seems nicer and
+# faster than running new nbdkit for each test.
+test_me 2 -c 'discard 0 1' -c 'map' -c 'discard 0 1' -c 'map'
+# Write should kill the cache as well.
+test_me 2 -c 'discard 0 1' -c 'map' -c 'write 0 1' -c 'map'
+# Alloc should use cached data from map
+test_me 1 -c 'discard 0 1' -c 'map' -c 'alloc 0'
+# Read should not kill the cache
+test_me 1 -c 'discard 0 1' -c 'map' -c 'read 0 1' -c 'map' -c 'alloc 0'
--
2.21.0
5 years, 5 months
[libnbd PATCH] Fix building with for ocaml < 4.06.0
by Martin Kletzander
Signed-off-by: Martin Kletzander <mkletzan(a)redhat.com>
---
Notes:
This is the simplest solution suggested to me by Rich and it works.
Most of autotools are not friends with me, so the other approach I tried, with
requiring at least OCaml 4.05.0, having this fix in only for OCaml < 4.06.0 and
properly detecting and reporting that, was a bit ugly.
You can see the configure.ac part of it here: http://ix.io/1L0o
If that is preferred, then I can send that one instead.
ocaml/nbd-c.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/ocaml/nbd-c.h b/ocaml/nbd-c.h
index 558581850807..45ae64d77209 100644
--- a/ocaml/nbd-c.h
+++ b/ocaml/nbd-c.h
@@ -27,6 +27,11 @@
#include <caml/memory.h>
#include <caml/mlvalues.h>
+// Workaround for OCaml < 4.06.0
+#ifndef Bytes_val
+#define Bytes_val(x) String_val(x)
+#endif
+
extern void libnbd_finalize (value);
extern void nbd_buffer_finalize (value);
--
2.21.0
5 years, 5 months
[nbdkit PATCH] crypto: Tweak handling of SEND_MORE
by Eric Blake
In the recent commit 3842a080 to add SEND_MORE support, I blindly
implemented the tls code as:
if (SEND_MORE) {
cork
send
} else {
send
uncork
}
because it showed improvements for my test case of aio-parallel-load
from libnbd. But that test sticks to 64k I/O requests.
But with further investigation, I've learned that even though gnutls
corking works great for smaller NBD_CMD_READ replies, it can actually
pessimize behavior for larger requests (such as a client sending lots
of 2M read requests). This is because gnutls loses time both to
realloc()s to copy the entire packet into memory, as well as CPU time
spent encrypting the entire payload before anything is sent, not to
mention that it still ends up fragmenting the message to fit TCP
segment sizing.
So, let's further tweak the logic to make a decision based on a
heuristic: if we're going to split the reply for exceeding a TCP
segment anyway, then uncork before the data (this sends the header
out early as a partial packet, but that happens while the CPU is
encrypting the large payload); otherwise, there's still hope that we
can merge the previous request and this one into a single TCP segment
(which may or may not happen, based on how much overhead TLS adds,
given that 64k is the maximum TCP segment). It may turn out that we
can tune the limit for slightly better performance (maybe 32k is
smarter than 64k), but since the previous commit saw improvement with
the benchmark using 64k packets, that's the value picked for now.
Signed-off-by: Eric Blake <eblake(a)redhat.com>
---
server/crypto.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/server/crypto.c b/server/crypto.c
index 3f3d96b..356f04f 100644
--- a/server/crypto.c
+++ b/server/crypto.c
@@ -345,6 +345,12 @@ crypto_recv (struct connection *conn, void *vbuf, size_t len)
return 1;
}
+/* If this send()'s length is so large that it is going to require
+ * multiple TCP segments anyway, there's no need to try and merge it
+ * with any corked data from a previous send that used SEND_MORE.
+ */
+#define MAX_SEND_MORE_LEN (64 * 1024)
+
/* Write buffer to GnuTLS and either succeed completely
* (returns 0) or fail (returns -1). flags is ignored for now.
*/
@@ -357,7 +363,11 @@ crypto_send (struct connection *conn, const void *vbuf, size_t len, int flags)
assert (session != NULL);
- if (flags & SEND_MORE)
+ if (len >= MAX_SEND_MORE_LEN) {
+ if (gnutls_record_uncork (session, GNUTLS_RECORD_WAIT) < 0)
+ return -1;
+ }
+ else if (flags & SEND_MORE)
gnutls_record_cork (session);
while (len > 0) {
--
2.20.1
5 years, 5 months
[nbdkit PATCH 0/2] Few rust plugin fixups/nitpicks
by Martin Kletzander
There are few more things that could be cleaned up related to the coding style
and other things, like explicitly specifying the abi style after "extern"
(i.e. `extern "C" fn` instead of `extern fn`), but since those are configurable
in rustfmt config, I'm not sure whether the config needs to be added or
complying with the defaults should be the priority. But this was just something
I stumbled upon when I saw some warnings during build, nothing more.
Martin Kletzander (2):
rust: Do not use deprecated integer types
rust: Remove libc dependency
plugins/rust/Cargo.toml.in | 1 -
plugins/rust/examples/ramdisk.rs | 15 ++++++-------
plugins/rust/src/lib.rs | 37 +++++++++++++++-----------------
3 files changed, 24 insertions(+), 29 deletions(-)
--
2.21.0
5 years, 5 months
[PATCH libnbd] states: In recv_into_rbuf and send_from_wbuf loop until EAGAIN.
by Richard W.M. Jones
I thought this should produce a fairly dramatic performance gain. In
fact I couldn't measure any performance difference at all. I think
what's happening is we're actually paying an extra syscall (to
discover the socket would block) and then doing the poll anyway.
So I don't know if it's worth having this patch. It could be argued
that it makes the code shorter (therefore simpler), so I'm posting it,
but I'm ambivalent.
Rich.
5 years, 5 months
[nbdkit PATCH v2 0/2] Reduce network overhead with MSG_MORE/corking
by Eric Blake
This time around, the numbers are indeed looking better than in v1;
and I like the interface better.
Eric Blake (2):
server: Prefer send() over write()
server: Group related transmission send()s
server/internal.h | 7 +++-
server/connections.c | 51 +++++++++++++++++++++++++---
server/crypto.c | 11 ++++--
server/protocol-handshake-newstyle.c | 22 ++++++------
server/protocol-handshake-oldstyle.c | 2 +-
server/protocol.c | 22 ++++++------
6 files changed, 85 insertions(+), 30 deletions(-)
--
2.20.1
5 years, 5 months
[libnbd PATCH] tls: Check for pending bytes in gnutls buffers
by Eric Blake
Checking for poll(POLLIN) only wakes us up when the server sends more
bytes over the wire. But the way gnutls is implemented, it reads as
many non-blocking bytes as possible from the wire before then parsing
where the encoded message boundaries lie within those bytes. As a
result, we may already have the server's next reply in memory without
needing to block on the server sending yet more bytes (especially if
the server was employing tricks like corking to batch things up for
fewer packets over the network).
If we are using synchronous API, this is not a problem (we never have
more than one request in flight at once, which implies there are no
outstanding replies from the server when we send a request, so we
always get the needed POLLIN when the server's next reply begins); but
if we are using asynch API and have more than one message in flight,
it is conceivable that the server flushes its complete reply queue
(two or more replies) contained within a single encrypted network
packet, but where our state machine reads only the first reply and
then waits for a POLLIN kick before reading again. Fortunately, I
could not think of a scenario where this would turn into direct
deadlock in our state machine alone (although I also haven't
definitively ruled out that possibility) - in the common case, the
only way the server can send more than one reply at once is if it
supports parallel in-flight requests; which means we are safe that the
next asynch message we send will be received by the server, and its
reply to that message will give us the POLLIN kick we need to finally
process the reply that had been stuck (for a possibly long time) in
the local buffer.
But I _did_ come up with a problematic scenario - if the server's
final two replies before our NBD_CMD_DISC occur in the same packet, we
could end up with the socket closed without ever handling the second
reply, giving us data loss for that reply. It's also a potential
indirect deadlock: the NBD spec recommends that a client not call
NBD_CMD_DISC until it has received all responses to outstanding
in-flight requests, so a client that has finished its work and does
not plan to make any more aio requests other than NBD_CMD_DISC and
wants to block until all server replies are in will hang waiting for a
POLLIN that the server never sends because we already have the reply
and aren't sending more requests.
Testing shows that this patch has a slight performance improvement (to
be expected, since we are eliminating an artificial latency on
starting the REPLY sequence in the state machine) - on my machine,
running tests/aio-parallel-load-tls saw an increase from 11522.9 IOPS
pre-patch to 12558.1 IOPS post-patch.
---
I'm pushing this because of the performance improvement, but wanted
to post it because it was an interesting problem. (Sad when my writeup
is three times longer than the patch itself...)
generator/states.c | 5 +++++
lib/crypto.c | 7 +++++++
lib/internal.h | 1 +
3 files changed, 13 insertions(+)
diff --git a/generator/states.c b/generator/states.c
index bce4f85..145e8c1 100644
--- a/generator/states.c
+++ b/generator/states.c
@@ -113,6 +113,11 @@ send_from_wbuf (struct nbd_handle *h)
READY:
if (h->cmds_to_issue)
SET_NEXT_STATE (%ISSUE_COMMAND.START);
+ else {
+ assert (h->sock);
+ if (h->sock->ops->pending && h->sock->ops->pending (h->sock))
+ SET_NEXT_STATE (%REPLY.START);
+ }
return 0;
DEAD:
diff --git a/lib/crypto.c b/lib/crypto.c
index 3f7c744..e0f173f 100644
--- a/lib/crypto.c
+++ b/lib/crypto.c
@@ -181,6 +181,12 @@ tls_send (struct nbd_handle *h,
return r;
}
+static bool
+tls_pending (struct socket *sock)
+{
+ return gnutls_record_check_pending (sock->u.tls.session) > 0;
+}
+
static int
tls_get_fd (struct socket *sock)
{
@@ -209,6 +215,7 @@ tls_close (struct socket *sock)
static struct socket_ops crypto_ops = {
.recv = tls_recv,
.send = tls_send,
+ .pending = tls_pending,
.get_fd = tls_get_fd,
.close = tls_close,
};
diff --git a/lib/internal.h b/lib/internal.h
index 5b6152b..61ddbde 100644
--- a/lib/internal.h
+++ b/lib/internal.h
@@ -192,6 +192,7 @@ struct socket_ops {
struct socket *sock, void *buf, size_t len);
ssize_t (*send) (struct nbd_handle *h,
struct socket *sock, const void *buf, size_t len);
+ bool (*pending) (struct socket *sock);
int (*get_fd) (struct socket *sock);
int (*close) (struct socket *sock);
};
--
2.20.1
5 years, 5 months