Re: [Libguestfs] [nbdkit PATCH] retry: Tweak error message

Thursday, 2 April 2020

On Wed, Apr 01, 2020 at 08:46:53AM -0500, Eric Blake wrote:
...
 On 4/1/20 5:12 AM, Richard W.M. Jones wrote:
 >On Tue, Mar 31, 2020 at 04:00:22PM -0500, Eric Blake wrote:
 >>The retry filter defaults to 5 retries, but when run with verbose
 >>details produces some confusing output:
 >>
 >>$ rm -f /tmp/inject; (sleep 5s; touch /tmp/inject)& ./nbdkit -f -v -U - \
 >>   null 1G --filter=retry --filter=noextents --filter=error error-rate=100% \
 >>   error-file=/tmp/inject --filter=delay rdelay=1 \
 >>   --run 'qemu-img convert $nbd out.img'
 >>...
 >>nbdkit: null[1]: debug: noextents: pread count=2097152 offset=14680064
 >>nbdkit: null[1]: debug: error: pread count=2097152 offset=14680064
 >>nbdkit: null[1]: error: injecting EIO error into pread
 >>nbdkit: null[1]: debug: retry 6: original errno = 5
 >>nbdkit: null[1]: debug: could not recover after 5 retries
 >>

 >>   again:
 >>    /* Log the original errno since it will be lost when we retry. */
 >>-  nbdkit_debug ("retry %d: original errno = %d", data->retry+1,
*err);
 >>+  nbdkit_debug ("retry attempt %d: original errno = %d",
data->retry, *err);
 >>
 >>    if (data->retry >= retries) {
 >>      nbdkit_debug ("could not recover after %d retries", retries);
 >
 >I think adding 1 was correct, so that the retries count from 1.

 But the problem is the timing of when the messages are printed.  And
 remember, this code is only reached after a failed attempt, not on
 success.  Pre-patch, we have:

 initial try succeeds - no message

 or (here, with retries limited to 2):

 initial try fails:
  retry 1: original error
 delay
 retry 1 fails:
  retry 2: original error
 delay
 retry 2 fails:
  retry 3: original error
  could not recover after 2 retries

 >
 >How about moving the message to make it conditional instead,
 >like this?
 >
 >again:
 >   /* Log the original errno since it will be lost when we retry. */
 >   if (data->retry < retries)
 >     nbdkit_debug ("retry %d: original errno = %d", data->retry+1,
*err);
 >   else {
 >      nbdkit_debug ("could not recover after %d retries", retries);
 >   ...

 with your suggestion, we'd have:

 initial try fails:
  retry 1: original error
 delay
 retry 1 fails:
  retry 2: original error
 delay
 retry 2 fails:
  could not recover after 2 retries

 where we lost the log of the errno of retry 2. 
It's not printed, but also it's not lost.  do_retry will immediately
return false, causing the called function to return this final errno
(hopefully back to the client).  The reason for printing the "original
errno" was because those intermediate errnos would otherwise be lost.

Attached is my suggested patch - it's a small evolution on top of
yours.  The output is now:

$ rm -f /tmp/inject ; (sleep 5s; touch /tmp/inject ) & ./nbdkit -f -v -U - null
size=1G --filter=retry --filter=noextents --filter=error error-rate=100%
error-file=/tmp/inject --filter=delay rdelay=1 retries=3 --run 'qemu-img convert $nbd
/var/tmp/out.img'

nbdkit: null[1]: error: injecting EIO error into pread
nbdkit: null[1]: debug: pread failed: original errno = 5
nbdkit: null[1]: debug: retry 1: waiting 2 seconds before retrying

nbdkit: null[1]: error: injecting EIO error into pread
nbdkit: null[1]: debug: pread failed: original errno = 5
nbdkit: null[1]: debug: retry 2: waiting 4 seconds before retrying

nbdkit: null[1]: error: injecting EIO error into pread
nbdkit: null[1]: debug: pread failed: original errno = 5
nbdkit: null[1]: debug: retry 3: waiting 8 seconds before retrying

nbdkit: null[1]: error: injecting EIO error into pread
nbdkit: null[1]: debug: could not recover after 3 retries
nbdkit: null[1]: debug: sending error reply: Input/output error

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] [nbdkit PATCH] retry: Tweak error message