On 9/19/19 10:26 AM, Richard W.M. Jones wrote:
This filter can be used to transparently reopen/retry when a plugin
fails. The connection is closed and reopened which for most plugins
causes them to attempt to reconnect to their source.
For example if doing a long or slow SSH copy:
nbdkit -U - ssh host=remote /var/tmp/test.iso \
--run 'qemu-img convert -p -f raw $nbd -O qcow2 test.qcow2'
if the SSH connection or network goes down in the middle then the
whole operation will fail.
By adding the retry filter:
nbdkit -U - ssh --filter=retry host=remote /var/tmp/test.iso \
--run 'qemu-img convert -p -f raw $nbd -O qcow2 test.qcow2'
this operation can recover from temporary failures in at least some
circumstances. The NBD connection (a local Unix domain socket in the
example above) is not interrupted during retries, so NBD clients don't
need to be taught how to retry - everything is handled internally by
nbdkit.
---
Pretty cool how fast we've gone from idea to mostly-working code.
nbdkit is awesome for fast prototyping.
+++ b/filters/retry/nbdkit-retry-filter.pod
@@ -0,0 +1,103 @@
+=head1 NAME
+
+nbdkit-retry-filter - reopen connection on error
+
+=head1 SYNOPSIS
+
+ nbdkit --filter=retry PLUGIN
Worth adding [retry-parameter]... here?
+
+=head1 DESCRIPTION
+
+C<nbdkit-retry-filter> is a filter that transparently reopens the
+plugin connection when an error is encountered. It can be used to
+make long-running copy operations reliable in the presence of
+temporary network failures, without requiring any changes to the
+plugin.
+
+A number of parameters are available to control:
+
+=over 4
+
+=item *
+
+How many times we retry.
+
+=item *
+
+The delay between retries, and whether we wait longer each time (known
+as "exponential back-off").
+
+=item *
+
+If we reopen the plugin in read-only mode after the first failure.
+
+=back
+
+The default (no parameters) is designed to offer a happy medium
+between recovering from short temporary failures but not doing
+anything too bad when permanent or unrecoverable failures happen: We
+retry 5 times with exponential back-off, waiting in total about 1
+minute before we give up.
+
+=head1 EXAMPLE
+
+In this example we copy and convert a large file using
+L<nbdkit-ssh-plugin(1)>, L<qemu-img(1)> and L<nbdkit-captive(1)>.
+
+ nbdkit -U - ssh --filter=retry host=remote /var/tmp/test.iso \
+ --run 'qemu-img convert -p -f raw $nbd -O qcow2 test.qcow2'
+
+Without I<--filter=retry> a temporary failure would cause the copy to
+fail (for example, the remote host’s firewall is restarted causing the
+SSH connection to be closed). Adding this filter means that it may be
+possible to transparently recover.
+
+=head1 PARAMETERS
EXAMPLE before PARAMETERS is unusual. Not necessarily wrong, though.
+
+=over 4
+
+=item B<retries=>N
+
+The number of times any single operation will be retried before we
+give up and fail the operation. The default is 5.
+
+=item B<retry-delay=>N
+
+The number of seconds to wait before retrying. The default is 2
+seconds.
Worth allowing sub-second times?
+
+=item B<retry-exponential=yes>
+
+Use exponential back-off. The retry delay is doubled between each
+retry. This is the default.
+
+=item B<retry-exponential=no>
+
+Do not use exponential back-off. The retry delay is the same between
+each retry.
+
+=item B<retry-readonly=yes>
+
+As soon as a failure occurs and we retry, switch the underlying plugin
+permanently to read-only mode.
+
+=item B<retry-readonly=no>
+
+Do not change the read-write/read-only mode of the plugin when
+retrying. This is the default.
Worth an intermediate mode that only forces readonly if a write
operation fails, but tries to preserve read/write if a read operation fails?
+
+static int retries = 5; /* 0 = filter is disabled */
+static int initial_delay = 2;
Would need a different unit here if you allow sub-second retry.
+static bool exponential_backoff = true;
+static bool force_readonly = false;
Initializing a static variable to 0 is sometimes worth avoiding (as it
can force .data instead of .bss for a slightly larger binary), but here
it makes sense.
+
+static int
+retry_config (nbdkit_next_config *next, void *nxdata,
+ const char *key, const char *value)
+{
+ int r;
+
+ if (strcmp (key, "retries") == 0) {
+ if (sscanf (value, "%d", &retries) != 1 || retries < 0) {
+ nbdkit_error ("cannot parse retries: %s", value);
+ return -1;
+ }
+ }
+ else if (strcmp (key, "retry-delay") == 0) {
+ if (sscanf (value, "%d", &initial_delay) != 1 || initial_delay <=
0) {
sscanf("%d") cannot detect overflow; should this use strtol with errno
checking instead?
+#define retry_config_help \
+ "retries=<N> Number of retries (default: 5).\n" \
+ "retry-delay=<N> Second to wait before retry (default: 2).\n"
\
Seconds
+ "retry-exponential=yes|no Exponential back-off (default:
yes).\n" \
+ "retry-readonly=yes|no Force read-only on failure (default: no).\n"
+
+struct retry_handle {
+ int readonly; /* Save original readonly setting. */
+};
+
+static void *
+retry_open (nbdkit_next_open *next, void *nxdata, int readonly)
Out of review time today, I'll get back to the rest of this file later...
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization:
qemu.org |
libvirt.org