On Fri, Oct 15, 2021 at 03:17:16PM +0100, Richard W.M. Jones wrote:
Similar to nbdkit-retry-filter, but this only retries single failing
requests rather than reopening the whole plugin. This is intended to
be used with the curl filter to solve:
https://bugzilla.redhat.com/show_bug.cgi?id=2013000
---
+++ b/filters/retry-request/nbdkit-retry-request-filter.pod
@@ -0,0 +1,74 @@
+=head1 NAME
+
+nbdkit-retry-request-filter - retry single requests on error
+
+=head1 SYNOPSIS
+
+ nbdkit --filter=retry-request PLUGIN
+ [retry-request-retries=N] [retry-request-delay=N]
+ [retry-request-open=false]
+
+=head1 DESCRIPTION
+
+C<nbdkit-retry-request-filter> is a filter for nbdkit that
+transparently retries single requests if they fail. This is useful
+for plugins that are not completely reliable or have random behaviour.
+For example L<nbdkit-curl-plugin(1)> might behave this way if pointed
+at a load balancer which sometimes redirects to web server that is not
to a web
+responsive.
+
+An alternative filter with different trade-offs is
+L<nbdkit-retry-filter(1)>. That filter is more heavyweight because it
+always reopens the whole plugin connection on failure.
+
+=head1 PARAMETERS
+
+=over 4
+
+=item B<retry-request-retries=>N
+
+The number of times any single request will be retried before we give
+up and fail the operation. The default is 2.
+
+=item B<retry-request-delay=>N
+
+The number of seconds to wait before retrying. The default is 2
+seconds.
+
+=item B<retry-request-open=false>
+
+If set to false, do not retry opening the plugin. The default is to
+treat plugin open in the same way as other requests.
+
+++ b/filters/retry-request/retry-request.c
@@ -0,0 +1,278 @@
+
+/* These macros encapsulate the logic of retrying. The code between
+ * RETRY_START...RETRY_END must set r to 0 or -1 on success or
+ * failure.
+ */
+#define RETRY_START \
+ { \
+ unsigned i; \
+ \
+ r = -1; \
+ for (i = 0; r == -1 && i <= retries; ++i) { \
+ if (i > 0) { \
+ nbdkit_debug ("retry %u: waiting %u seconds before retrying", \
+ i, delay); \
+ if (nbdkit_nanosleep (delay, 0) == -1) { \
+ if (*err == 0) \
+ *err = errno; \
You documented 'r', but not 'err' needing to be in scope as well.
+ break;
\
+ } \
+ } \
+ do
+#define RETRY_END \
+ while (0); \
+ } \
+ }
+
+static void *
+retry_request_open (nbdkit_next_open *next, nbdkit_context *nxdata,
+ int readonly, const char *exportname, int is_tls)
+{
+ int r;
+
+ if (retry_open_call) {
+ int *err = &errno; /* used by the RETRY* macros */
+
+ RETRY_START
+ r = next (nxdata, readonly, exportname);
+ RETRY_END;
Most other calls into the plugin are obviously safe, but there are
lots of asserts around calls into nbdkit_next_open obeying
expectations - if you didn't trip any when .open happened to hit the
bad mirror out of the round robin, then I think we're okay here on
life-cycle.
+ }
+ else {
+ r = next (nxdata, readonly, exportname);
+ }
+
+ return r == 0 ? NBDKIT_HANDLE_NOT_NEEDED : NULL;
+}
+
+static int
+retry_request_pread (nbdkit_next *next,
+ void *handle, void *buf, uint32_t count, uint64_t offset,
+ uint32_t flags, int *err)
+{
+ int r;
+
+ RETRY_START
+ r = next->pread (next, buf, count, offset, flags, err);
+ RETRY_END;
The simple case is still legible,...
+static int
+retry_request_extents (nbdkit_next *next,
+ void *handle,
+ uint32_t count, uint64_t offset, uint32_t flags,
+ struct nbdkit_extents *extents, int *err)
+{
+ CLEANUP_EXTENTS_FREE struct nbdkit_extents *extents2 = NULL;
+ int r;
+
+ RETRY_START {
+ /* Each retry must begin with extents reset to the right beginning. */
+ nbdkit_extents_free (extents2);
+ extents2 = nbdkit_extents_new (offset, next->get_size (next));
+ if (extents2 == NULL) {
+ *err = errno;
+ return -1; /* Not worth a retry after ENOMEM. */
+ }
+ r = next->extents (next, count, offset, flags, extents2, err);
+ } RETRY_END;
...and the complex case seems much nicer. Yeah, I like this layout better.
+
+ if (r == 0) {
+ size_t i;
+
+ /* Transfer the successful extents back to the caller. */
+ for (i = 0; i < nbdkit_extents_count (extents2); ++i) {
+ struct nbdkit_extent e = nbdkit_get_extent (extents2, i);
+
+ if (nbdkit_add_extent (extents, e.offset, e.length, e.type) == -1) {
+ *err = errno;
+ return -1;
This is another spot where failure is not due to a flaky plugin, but
to internal errors (probably ENOMEM); I agree with your analysis that
retrying the transfer is not worthwhile.
+++ b/tests/test-retry-request.sh
@@ -0,0 +1,94 @@
+#!/usr/bin/env bash
+
+requires_plugin sh
+requires nbdcopy --version
+requires dd iflag=count_bytes </dev/null
+
+files="retry-request.img retry-request-count"
+rm -f $files
+cleanup_fn rm -f $files
+
+touch retry-request-count
+start_t=$SECONDS
+
+# Create a custom plugin which will test retrying requests.
+nbdkit -v -U - \
+ sh - \
+ --filter=retry-request retry-request-retries=3 retry-request-delay=1 \
+ --run 'nbdcopy --synchronous "$uri" retry-request.img'
<<'EOF'
+#!/usr/bin/env bash
+case "$1" in
+ get_size) echo 512 ;;
+ pread)
+ # Fail 3 times then succeed.
+ read i < retry-request-count
+ ((i++))
+ echo $i > retry-request-count
+ if [ $i -le 3 ]; then
+ echo "EIO pread failed" >&2
+ exit 1
+ else
+ dd if=/dev/zero count=$3 iflag=count_bytes
+ fi
+ ;;
+
+ *) exit 2 ;;
+esac
+EOF
+
+# In this test we should see 3 failures:
+# pread FAILS
+# retry and wait 1 second
+# pread FAILS
+# retry and wait 1 second
+# pread FAILS
+# retry and wait 1 second
+# pread succeeds
+
+# The minimum time for the test should be 3 seconds.
+end_t=$SECONDS
+if [ $((end_t - start_t)) -lt 3 ]; then
+ echo "$0: test ran too quickly"
+ exit 1
+fi
+
+# Check retry-request-count = 4 (because we write #retries+1 to the file)
+read retry_count < retry-request-count
+if [ $retry_count -ne 4 ]; then
+ echo "$0: retry-request-count ($retry_count) != 4"
+ exit 1
+fi
Looks okay. But we should also test a retry of .open failing, since
that one goes through enough difference in code path to be worth
checking closely.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization:
qemu.org |
libvirt.org