Re: [Libguestfs] [nbdkit PATCH] sh: Allow pwrite to not consume all data

Thursday, 31 August 2023

On Thu, Aug 31, 2023 at 12:05:17PM +0200, Laszlo Ersek wrote:
...
 On 8/31/23 11:47, Richard W.M. Jones wrote:
 > On Thu, Aug 31, 2023 at 11:12:59AM +0200, Laszlo Ersek wrote:
 >> On 8/31/23 10:02, Richard W.M. Jones wrote:
 >>>
 >>> On Wed, Aug 30, 2023 at 05:21:19PM -0500, Eric Blake wrote:
 >>>> I hit another transient failure in libnbd CI when a poorly-written
 >>>> eval script did not consume all of stdin during .pwrite.  As behaving
 >>>> as a data sink can be a somewhat reasonable feature of a
 >>>> quickly-written sh or eval plugin, we should not be so insistent as
 >>>> treating an EPIPE failure as an immediate return of EIO to the client.
 >>>
 >>> I was thinking about this over night, and came to the conclusion that
 >>> it's always fine to ignore EPIPE errors.
 >>
 >> Interesting; I formed the opposite impression!
 >>
 >>> For example a script might
 >>> be processing the input data gradually and then encounter an error and
 >>> want to exit immediately.  We also have plenty of plugins that discard
 >>> some or all of the written data.
 >>
 >> But that would be associated with a nonzero exit status, right?
 > 
 > In the error case, yes it would exit with a non-zero exit status.
 > In the "don't care about data" case it would exit with 0.
 > 
 > As a concrete example the code currently has to look like this:
 > 
 >   case "$1")
 >     can_write) echo 0 ;;
 >     pwrite)
 >       ...
 >       if [ there is an error ]; then
 >         cat >/dev/null  # discard stdin
 >         echo 'EIO I/O error' >&2
 >         exit 1
 >       else
 >         cat >/dev/null  # discard stdin
 >         exit 0
 >       fi
 > 
 > where we're saying that the 'cat >/dev/null' commands are
unnecessary
 > complication.

 OK, I'm *starting* to form an understanding, I think.

 With a C-language plugin, we never have to worry about the plugin
 *seeing* the input to the pwrite callback. So it is then up to the
 plugin to do something with the data. If it wants to return success (0)
 without processing even one byte from the input, that's on the plugin.
 Furthermore, if the plugin crashes, then all of nbdkit crashes (I think
 -- same process, right?), and the client (libnbd) notices that.

 However, with the shell plugin, we have a more foundational problem. The
 pwrite callback runs in a separate process, and so "parameter passing"
 becomes risky in its own right.

 I'd say:

 - If a shell plugin pwrite exits normally, then ignoring EPIPE in nbdkit
 should be OK. Given the explicit exit status (zero or nonzero), and the
 errno stuff on stderr, we can trust the script to have made a conscious
 decision, and accept that it didn't want to read all input. Fine.

 - But if the same script crashes, possibly before it made up its mind
 about reading all input or not, we don't have a normal exit status (zero
 or nonzero) to rely upon. One way we learn about the script crash is
 EPIPE. It's not fool-proof (the script might crash right after it reads
 the last byte from the input), but when it happens, it indicates
 something wrong.

 Let me check the shell plugin for exit status handling...

 Right, this is what I've been missing:

   if (WIFSIGNALED (status)) {
     nbdkit_error ("%s: script terminated by signal %d",
                   argv0, WTERMSIG (status));
     goto error;
   }

 All good. With this, we can say that we're going to asssume that an
 EPIPE combined with a normal exit (zero or nonzero status) means the
 script didn't care for the rest of the input, and assume that EPIPE
 combined with a child crash is just a child crash, which we're catching
 anyway. Hence EPIPE makes no difference. 
Yes, I think this is the case.  We'd see some exit code like 139(?)
which would be detected by WIFSIGNALED (above) and that should get
turned into EIO back over the wire to the NBD client.  Actually let's
test that ...

  $ ./nbdkit -fv -U - eval can_write="echo 0" pwrite='kill -s BUS $$'
get_size="echo 1M" --run 'nbdsh -u "$uri" -c
"h.pwrite(bytearray(512),0)"'
  [...]
  nbdkit: eval[1]: debug: calling: /tmp/nbdkitj4ajqs/pwrite pwrite "" 512 0
""
  nbdkit: eval[1]: error: /tmp/nbdkitj4ajqs/pwrite: script terminated by signal 7
  nbdkit: eval[1]: error: /tmp/nbdkitj4ajqs/pwrite: script exited with error, but did not
print an error message on stderr
  nbdkit: eval[1]: debug: sending error reply: Input/output error
  nbdsh: command line script failed: nbd_pwrite: write: command failed: Input/output
error

Seems to work.

The third case is where the plugin exits (not crashes) with !0 which
we deal with according to the exit code:

https://libguestfs.org/nbdkit-sh-plugin.3.html#Exit-codes

Rich.

...
 Thanks
 Laszlo

 > 
 > There have been cases where we have forgotten to discard stdin on
 > every exit path and this has caused intermittent test failures in CI:
 > 
 > https://gitlab.com/nbdkit/libnbd/-/commit/4df70870420d1be9348ac45f4aa4675...
 > https://gitlab.com/nbdkit/libnbd/-/commit/c713529e9fd0641b2d73f764517b5f9...
 > 
 > Rich.
 > 
 >> And that way the nbd client would see the pwrite operation as failed.
 >>
 >> Laszlo
 >>
 >>>
 >>> So my counter-proposal (coming soon) is going to simply turn the EPIPE
 >>> error into a debug message and discard the rest of the write buffer.
 >>>
 >>> Rich.
 >>>
 >>>
 >>>> Signed-off-by: Eric Blake <eblake(a)redhat.com&gt;
 >>>> ---
 >>>>
 >>>> I probably need to add unit test coverage of this before committing
 >>>> (although proving that I win the data race on a client process exiting
 >>>> faster than the parent can write enough data to hit EPIPE is hard).
 >>>>
 >>>>  plugins/sh/nbdkit-sh-plugin.pod |  8 +++++++
 >>>>  plugins/sh/call.c               | 38 ++++++++++++++++++++++-----------
 >>>>  2 files changed, 34 insertions(+), 12 deletions(-)
 >>>>
 >>>> diff --git a/plugins/sh/nbdkit-sh-plugin.pod
b/plugins/sh/nbdkit-sh-plugin.pod
 >>>> index b2c946a0..8b83a5b3 100644
 >>>> --- a/plugins/sh/nbdkit-sh-plugin.pod
 >>>> +++ b/plugins/sh/nbdkit-sh-plugin.pod
 >>>> @@ -505,6 +505,14 @@ Unlike in other languages, if you provide a
C<pwrite> method you
 >>>>  B<must> also provide a C<can_write> method which exits with
code C<0>
 >>>>  (true).
 >>>>
 >>>> +With nbdkit E<ge> 1.36, this method may return C<0> without
consuming
 >>>> +any data from stdin, and without producing any output, in order to
 >>>> +behave as an intentional data sink.  But in older versions, nbdkit
 >>>> +would treat any C<EPIPE> failure in writing to your script as an
error
 >>>> +condition even if your script returns success; to avoid unintended
 >>>> +failures, you may want to include C<"cat
>/dev/null"> in a script
 >>>> +intending to ignore the client's write requests.
 >>>> +
 >>>>  =item C<flush>
 >>>>
 >>>>   /path/to/script flush <handle>
 >>>> diff --git a/plugins/sh/call.c b/plugins/sh/call.c
 >>>> index 888c6459..79c67a04 100644
 >>>> --- a/plugins/sh/call.c
 >>>> +++ b/plugins/sh/call.c
 >>>> @@ -34,6 +34,7 @@
 >>>>
 >>>>  #include <assert.h>
 >>>>  #include <fcntl.h>
 >>>> +#include <stdbool.h>
 >>>>  #include <stdio.h>
 >>>>  #include <stdlib.h>
 >>>>  #include <inttypes.h>
 >>>> @@ -130,6 +131,7 @@ debug_call (const char **argv)
 >>>>   */
 >>>>  static int
 >>>>  call3 (const char *wbuf, size_t wbuflen, /* sent to stdin (can be NULL)
*/
 >>>> +       bool *pipe_full,                  /* set if wbuf not fully
written */
 >>>>         string *rbuf,                     /* read from stdout */
 >>>>         string *ebuf,                     /* read from stderr */
 >>>>         const char **argv)                /* script + parameters */
 >>>> @@ -275,15 +277,8 @@ call3 (const char *wbuf, size_t wbuflen, /* sent to
stdin (can be NULL) */
 >>>>        r = write (pfds[0].fd, wbuf, wbuflen);
 >>>>        if (r == -1) {
 >>>>          if (errno == EPIPE) {
 >>>> -          /* We tried to write to the script but it didn't consume
 >>>> -           * the data.  Probably the script exited without reading
 >>>> -           * from stdin.  This is an error in the script.
 >>>> -           */
 >>>> -          nbdkit_error ("%s: write to script failed because of a
broken pipe: "
 >>>> -                        "this can happen if the script exits
without "
 >>>> -                        "consuming stdin, which usually indicates
a bug "
 >>>> -                        "in the script",
 >>>> -                        argv0);
 >>>> +          *pipe_full = true;
 >>>> +          r = wbuflen;
 >>>>          }
 >>>>          else
 >>>>            nbdkit_error ("%s: write: %m", argv0);
 >>>> @@ -555,7 +550,7 @@ call (const char **argv)
 >>>>    CLEANUP_FREE_STRING string rbuf = empty_vector;
 >>>>    CLEANUP_FREE_STRING string ebuf = empty_vector;
 >>>>
 >>>> -  r = call3 (NULL, 0, &rbuf, &ebuf, argv);
 >>>> +  r = call3 (NULL, 0, NULL, &rbuf, &ebuf, argv);
 >>>>    return handle_script_error (argv[0], &ebuf, r);
 >>>>  }
 >>>>
 >>>> @@ -568,7 +563,7 @@ call_read (string *rbuf, const char **argv)
 >>>>    int r;
 >>>>    CLEANUP_FREE_STRING string ebuf = empty_vector;
 >>>>
 >>>> -  r = call3 (NULL, 0, rbuf, &ebuf, argv);
 >>>> +  r = call3 (NULL, 0, NULL, rbuf, &ebuf, argv);
 >>>>    r = handle_script_error (argv[0], &ebuf, r);
 >>>>    if (r == ERROR)
 >>>>      string_reset (rbuf);
 >>>> @@ -584,7 +579,26 @@ call_write (const char *wbuf, size_t wbuflen, const
char **argv)
 >>>>    int r;
 >>>>    CLEANUP_FREE_STRING string rbuf = empty_vector;
 >>>>    CLEANUP_FREE_STRING string ebuf = empty_vector;
 >>>> +  bool pipe_full = false;
 >>>>
 >>>> -  r = call3 (wbuf, wbuflen, &rbuf, &ebuf, argv);
 >>>> +  r = call3 (wbuf, wbuflen, &pipe_full, &rbuf, &ebuf,
argv);
 >>>> +  if (pipe_full && r == OK) {
 >>>> +    /* We allow scripts to intentionally ignore data, but they must
 >>>> +     * have no output when doing so.
 >>>> +     */
 >>>> +    if (rbuf.len > 0 || ebuf.len > 0) {
 >>>> +      nbdkit_error ("%s: write to script failed because of a
broken pipe: "
 >>>> +                    "this can happen if the script exits without
"
 >>>> +                    "consuming stdin, which usually indicates a
bug "
 >>>> +                    "in the script",
 >>>> +                    argv[0]);
 >>>> +      r = ERROR;
 >>>> +    }
 >>>> +    else
 >>>> +      nbdkit_debug ("%s: write to script failed because of a
broken pipe; "
 >>>> +                    "assuming this was an intentional data sink,
although it "
 >>>> +                    "may indicate a bug in the script",
 >>>> +                    argv[0]);
 >>>> +  }
 >>>>    return handle_script_error (argv[0], &ebuf, r);
 >>>>  }
 >>>> -- 
 >>>> 2.41.0
 >>>>
 >>>> _______________________________________________
 >>>> Libguestfs mailing list
 >>>> Libguestfs(a)redhat.com
 >>>> https://listman.redhat.com/mailman/listinfo/libguestfs
 >>>
 >  
-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] [nbdkit PATCH] sh: Allow pwrite to not consume all data