[Adding the mailing list address back]
On Thu, Nov 21, 2019 at 12:44:38PM +0200, Nir Soffer wrote:
On Thu, Nov 21, 2019 at 12:16 PM Richard W.M. Jones
<rjones(a)redhat.com> wrote:
>
> On Wed, Nov 20, 2019 at 06:00:52PM +0200, Nir Soffer wrote:
> > 5. can_zero fails, not sure why (Maybe Richard knows more about this)
> >
> > nbdkit: python[1]: debug: python: can_zero
> > nbdkit: python[1]: debug: sending error reply: Operation not supported
> >
> > rhv-upload-plugin always reports that it can zero, I cannot explain this.
>
> I believe this is a bug in nbdkit’s python bindings. Will take a
> closer look and reply in a new thread. I don't believe it affects
> anything here however.
>
> > 6. writing the first sector fails
> >
> > python[1]: nbdkit: debug: vddk[3]python: pwrite count=65536 offset=0 fua=0:
> > ...
> > nbdkit: python[1]: error:
> > /var/tmp/rhvupload.RShGz0/rhv-upload-plugin.py: pwrite: error:
> > ['Traceback (most recent call last):\n', ' File
> > "/var/tmp/rhvupload.RShGz0/rhv-upload-plugin.py", line 189, in
> > pwrite\n http.endheaders()\n', ' File
> > "/usr/lib64/python3.7/http/client.py", line 1247, in endheaders\n
> > self._send_output(message_body, encode_chunked=encode_chunked)\n', '
> > File "/usr/lib64/python3.7/http/client.py", line 1026, in
> > _send_output\n self.send(msg)\n', ' File
> > "/usr/lib64/python3.7/http/client.py", line 966, in send\n
> > self.connect()\n', ' File
> > "/var/tmp/rhvupload.RShGz0/rhv-upload-plugin.py", line 357, in
> > connect\n self.sock.connect(self.path)\n', 'ConnectionRefusedError:
> > [Errno 111] Connection refused\n']
> >
> > But it fails in send(), and we don't handle errors proeply around
> > send(), only around getresponse().
> > So the error is forgotten by the plugin.
>
> That sounds bad?
Indeed, but we never had this issue in real environment.
Basically we should handle any exception in the plugin as failure,
but I'm not sure where we should fix it.
We can add error handler around all the entry points
(e.g. pwrite()), setting h['failed'] = True. But this makes it
harder to write correct plugins; everyone has to ensure correct
error handling instead of getting it right by default.
In nbdkit-python-plugin, any exception which escapes will be caught in
the C code and turned into an NBD error on the wire. The function
which does this is:
https://github.com/libguestfs/nbdkit/blob/a52ebb4e8d18a73b9ae009cd768bfab...
Since each plugin method corresponds* (* = mostly) to a single NBD
command on the wire, returning an error from the method signals an
error back to the NBD client. If the NBD client is ‘qemu-img convert’
then it will exit the first time it sees an error.
(The retry filter
http://libguestfs.org/nbdkit-retry-filter.1.html
changes this behaviour, but we are not using that for output to
rhv-upload.)
We can handle this in the python plugin (e.g. h['failed'] =
True)
but the handle is controlled by the plugin, so this does not sound
like a good way.
The close() method is always called, whether or not there was an
error. The same connection handle (‘h’) is passed back to the close
method. It's quite safe and normal to store state in this handle.
nbdkit itself doesn't save the fact that there was an error during the
connection (it can't since it doesn't know if the error was
recoverable or not). But rhv-upload-plugin needs to handle failed vs
successful transfers differently. Thus we save the failure state in
the handle the first time we see an error.
Maybe we can change close() to:
def close(failed=True):
if failed:
# cleanup after some call failed
...
# Cleanup after successful run
The python plugin will remember unhandled errors and will call:
close(failed=True)
To write correct plugin author must not handle fatal errors in the plugin.
This changes how the Python plugin would work versus other plugins,
and the C code can't know if the NBD connection failed, because some
NBD errors are recoverable. Also the client does not send us any
error indication when it disconnects.
The answer is probably to wrap every method in a big try...except
which sets the failed flag in the handle.
How do we handle errors in other plugins?
It's kind of easier to study this using the sh plugin. Here's an
example showing the order in which callbacks happen on error:
----------------------------------------------------------------------
nbdkit sh -fv - --run 'qemu-img convert $nbd /tmp/out' <<'EOF'
case "$1" in
get_size) echo 512 ;;
pread) echo 'EIO error' >&2; exit 1 ;;
*) exit 2 ;;
esac
EOF
----------------------------------------------------------------------
Running this gives the following output (filtered a bit to show the
key points):
$ bash test.sh
nbdkit: debug: sh: load
nbdkit: debug: sh: load: tmpdir: /tmp/nbdkitshB6KCs0
nbdkit: debug: sh: config key=script, value=-
nbdkit: debug: sh: config_complete
nbdkit: sh[1]: debug: sh: open readonly=0
nbdkit: sh[1]: debug: calling: /tmp/nbdkitshB6KCs0/inline-script.sh open false
""
nbdkit: sh[1]: debug: completed: /tmp/nbdkitshB6KCs0/inline-script.sh open: status 2
nbdkit: sh[1]: debug: sh: open returned handle 0x7face4002070
nbdkit: sh[1]: debug: sh: get_size
nbdkit: sh[1]: debug: calling: /tmp/nbdkitshB6KCs0/inline-script.sh get_size
""
nbdkit: sh[1]: debug: completed: /tmp/nbdkitshB6KCs0/inline-script.sh get_size: status
0
nbdkit: sh[1]: debug: sh: can_write
nbdkit: sh[1]: debug: calling: /tmp/nbdkitshB6KCs0/inline-script.sh can_write
""
nbdkit: sh[1]: debug: completed: /tmp/nbdkitshB6KCs0/inline-script.sh can_write: status
2
...
nbdkit: sh[1]: debug: sh: pread count=512 offset=0
nbdkit: sh[1]: debug: calling: /tmp/nbdkitshB6KCs0/inline-script.sh pread ""
512 0
nbdkit: sh[1]: debug: completed: /tmp/nbdkitshB6KCs0/inline-script.sh pread: status 1
nbdkit: sh[1]: error: /tmp/nbdkitshB6KCs0/inline-script.sh: error
nbdkit: sh[1]: debug: sending error reply: Input/output error
nbdkit: sh[1]: debug: client sent NBD_CMD_DISC, closing connection
qemu-img: Could not open 'nbd:localhost:10809': Could not read image for
determining its format: Input/output error
nbdkit: sh[1]: debug: sh: close
nbdkit: sh[1]: debug: calling: /tmp/nbdkitshB6KCs0/inline-script.sh close ""
nbdkit: sh[1]: debug: completed: /tmp/nbdkitshB6KCs0/inline-script.sh close: status 2
nbdkit: debug: sh: unload plugin
nbdkit: debug: calling: /tmp/nbdkitshB6KCs0/inline-script.sh unload
nbdkit: debug: completed: /tmp/nbdkitshB6KCs0/inline-script.sh unload: status 2
Notice in particular how close() and unload() are both called in the
plugin even though qemu-img convert has already exited with an error.
Also the same handle (empty string above) is passed back to close().
We probably should continue to the mailing list.
Nir
Thanks, Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines. Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v