Re: [Libguestfs] [PATCH nbdkit v2 1/2] New filter: nbdkit-retry-request-filter

Friday, 15 October 2021

On Fri, Oct 15, 2021 at 03:17:16PM +0100, Richard W.M. Jones wrote:
...
 Similar to nbdkit-retry-filter, but this only retries single failing
 requests rather than reopening the whole plugin.  This is intended to
 be used with the curl filter to solve:

 https://bugzilla.redhat.com/show_bug.cgi?id=2013000
 --- 
...
 +++ b/filters/retry-request/nbdkit-retry-request-filter.pod
 @@ -0,0 +1,74 @@
 +=head1 NAME
 +
 +nbdkit-retry-request-filter - retry single requests on error
 +
 +=head1 SYNOPSIS
 +
 + nbdkit --filter=retry-request PLUGIN
 +                   [retry-request-retries=N] [retry-request-delay=N]
 +                   [retry-request-open=false]
 +
 +=head1 DESCRIPTION
 +
 +C<nbdkit-retry-request-filter> is a filter for nbdkit that
 +transparently retries single requests if they fail.  This is useful
 +for plugins that are not completely reliable or have random behaviour.
 +For example L<nbdkit-curl-plugin(1)> might behave this way if pointed
 +at a load balancer which sometimes redirects to web server that is not 
to a web

...
 +responsive.
 +
 +An alternative filter with different trade-offs is
 +L<nbdkit-retry-filter(1)>.  That filter is more heavyweight because it
 +always reopens the whole plugin connection on failure.
 +
 +=head1 PARAMETERS
 +
 +=over 4
 +
 +=item B<retry-request-retries=>N
 +
 +The number of times any single request will be retried before we give
 +up and fail the operation.  The default is 2.
 +
 +=item B<retry-request-delay=>N
 +
 +The number of seconds to wait before retrying.  The default is 2
 +seconds.
 +
 +=item B<retry-request-open=false>
 +
 +If set to false, do not retry opening the plugin.  The default is to
 +treat plugin open in the same way as other requests.
 +
 +++ b/filters/retry-request/retry-request.c
 @@ -0,0 +1,278 @@ 
...
 +
 +/* These macros encapsulate the logic of retrying.  The code between
 + * RETRY_START...RETRY_END must set r to 0 or -1 on success or
 + * failure.
 + */
 +#define RETRY_START                                                     \
 +  {                                                                     \
 +    unsigned i;                                                         \
 +                                                                        \
 +    r = -1;                                                             \
 +    for (i = 0; r == -1 && i <= retries; ++i) {                         \
 +      if (i > 0) {                                                      \
 +        nbdkit_debug ("retry %u: waiting %u seconds before retrying",   \
 +                      i, delay);                                        \
 +        if (nbdkit_nanosleep (delay, 0) == -1) {                        \
 +          if (*err == 0)                                                \
 +            *err = errno;                                               \ 
You documented 'r', but not 'err' needing to be in scope as well.

...
 +          break;                                                    
   \
 +        }                                                               \
 +      }                                                                 \
 +      do
 +#define RETRY_END                                                       \
 +      while (0);                                                        \
 +    }                                                                   \
 +  }
 +
 +static void *
 +retry_request_open (nbdkit_next_open *next, nbdkit_context *nxdata,
 +                    int readonly, const char *exportname, int is_tls)
 +{
 +  int r;
 +
 +  if (retry_open_call) {
 +    int *err = &errno;          /* used by the RETRY* macros */
 +
 +    RETRY_START
 +      r = next (nxdata, readonly, exportname);
 +    RETRY_END; 
Most other calls into the plugin are obviously safe, but there are
lots of asserts around calls into nbdkit_next_open obeying
expectations - if you didn't trip any when .open happened to hit the
bad mirror out of the round robin, then I think we're okay here on
life-cycle.

...
 +  }
 +  else {
 +    r = next (nxdata, readonly, exportname);
 +  }
 +
 +  return r == 0 ? NBDKIT_HANDLE_NOT_NEEDED : NULL;
 +}
 +
 +static int
 +retry_request_pread (nbdkit_next *next,
 +                     void *handle, void *buf, uint32_t count, uint64_t offset,
 +                     uint32_t flags, int *err)
 +{
 +  int r;
 +
 +  RETRY_START
 +    r = next->pread (next, buf, count, offset, flags, err);
 +  RETRY_END; 
The simple case is still legible,...

...
 +static int
 +retry_request_extents (nbdkit_next *next,
 +                       void *handle,
 +                       uint32_t count, uint64_t offset, uint32_t flags,
 +                       struct nbdkit_extents *extents, int *err)
 +{
 +  CLEANUP_EXTENTS_FREE struct nbdkit_extents *extents2 = NULL;
 +  int r;
 +
 +  RETRY_START {
 +    /* Each retry must begin with extents reset to the right beginning. */
 +    nbdkit_extents_free (extents2);
 +    extents2 = nbdkit_extents_new (offset, next->get_size (next));
 +    if (extents2 == NULL) {
 +      *err = errno;
 +      return -1; /* Not worth a retry after ENOMEM. */
 +    }
 +    r = next->extents (next, count, offset, flags, extents2, err);
 +  } RETRY_END; 
...and the complex case seems much nicer. Yeah, I like this layout better.

...
 +
 +  if (r == 0) {
 +    size_t i;
 +
 +    /* Transfer the successful extents back to the caller. */
 +    for (i = 0; i < nbdkit_extents_count (extents2); ++i) {
 +      struct nbdkit_extent e = nbdkit_get_extent (extents2, i);
 +
 +      if (nbdkit_add_extent (extents, e.offset, e.length, e.type) == -1) {
 +        *err = errno;
 +        return -1; 
This is another spot where failure is not due to a flaky plugin, but
to internal errors (probably ENOMEM); I agree with your analysis that
retrying the transfer is not worthwhile.

...
 +++ b/tests/test-retry-request.sh
 @@ -0,0 +1,94 @@
 +#!/usr/bin/env bash 
...
 +
 +requires_plugin sh
 +requires nbdcopy --version
 +requires dd iflag=count_bytes </dev/null
 +
 +files="retry-request.img retry-request-count"
 +rm -f $files
 +cleanup_fn rm -f $files
 +
 +touch retry-request-count
 +start_t=$SECONDS
 +
 +# Create a custom plugin which will test retrying requests.
 +nbdkit -v -U - \
 +       sh - \
 +       --filter=retry-request retry-request-retries=3 retry-request-delay=1 \
 +       --run 'nbdcopy --synchronous "$uri" retry-request.img'
<<'EOF'
 +#!/usr/bin/env bash
 +case "$1" in
 +    get_size) echo 512 ;;
 +    pread)
 +        # Fail 3 times then succeed.
 +        read i < retry-request-count
 +        ((i++))
 +        echo $i > retry-request-count
 +        if [ $i -le 3 ]; then
 +            echo "EIO pread failed" >&2
 +            exit 1
 +        else
 +            dd if=/dev/zero count=$3 iflag=count_bytes
 +        fi
 +        ;;
 +
 +    *) exit 2 ;;
 +esac
 +EOF
 +
 +# In this test we should see 3 failures:
 +# pread FAILS
 +# retry and wait 1 second
 +# pread FAILS
 +# retry and wait 1 second
 +# pread FAILS
 +# retry and wait 1 second
 +# pread succeeds
 +
 +# The minimum time for the test should be 3 seconds.
 +end_t=$SECONDS
 +if [ $((end_t - start_t)) -lt 3 ]; then
 +    echo "$0: test ran too quickly"
 +    exit 1
 +fi
 +
 +# Check retry-request-count = 4 (because we write #retries+1 to the file)
 +read retry_count < retry-request-count
 +if [ $retry_count -ne 4 ]; then
 +    echo "$0: retry-request-count ($retry_count) != 4"
 +    exit 1
 +fi 
Looks okay. But we should also test a retry of .open failing, since
that one goes through enough difference in code path to be worth
checking closely.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] [PATCH nbdkit v2 1/2] New filter: nbdkit-retry-request-filter