On Thu, Nov 21, 2019 at 1:39 PM Richard W.M. Jones <rjones(a)redhat.com> wrote:
[Adding the mailing list address back]
On Thu, Nov 21, 2019 at 12:44:38PM +0200, Nir Soffer wrote:
> On Thu, Nov 21, 2019 at 12:16 PM Richard W.M. Jones <rjones(a)redhat.com>
wrote:
> >
> > On Wed, Nov 20, 2019 at 06:00:52PM +0200, Nir Soffer wrote:
> > > 5. can_zero fails, not sure why (Maybe Richard knows more about this)
> > >
> > > nbdkit: python[1]: debug: python: can_zero
> > > nbdkit: python[1]: debug: sending error reply: Operation not supported
> > >
> > > rhv-upload-plugin always reports that it can zero, I cannot explain this.
> >
> > I believe this is a bug in nbdkit’s python bindings. Will take a
> > closer look and reply in a new thread. I don't believe it affects
> > anything here however.
> >
> > > 6. writing the first sector fails
> > >
> > > python[1]: nbdkit: debug: vddk[3]python: pwrite count=65536 offset=0
fua=0:
> > > ...
> > > nbdkit: python[1]: error:
> > > /var/tmp/rhvupload.RShGz0/rhv-upload-plugin.py: pwrite: error:
> > > ['Traceback (most recent call last):\n', ' File
> > > "/var/tmp/rhvupload.RShGz0/rhv-upload-plugin.py", line 189, in
> > > pwrite\n http.endheaders()\n', ' File
> > > "/usr/lib64/python3.7/http/client.py", line 1247, in
endheaders\n
> > > self._send_output(message_body, encode_chunked=encode_chunked)\n',
'
> > > File "/usr/lib64/python3.7/http/client.py", line 1026, in
> > > _send_output\n self.send(msg)\n', ' File
> > > "/usr/lib64/python3.7/http/client.py", line 966, in send\n
> > > self.connect()\n', ' File
> > > "/var/tmp/rhvupload.RShGz0/rhv-upload-plugin.py", line 357, in
> > > connect\n self.sock.connect(self.path)\n',
'ConnectionRefusedError:
> > > [Errno 111] Connection refused\n']
> > >
> > > But it fails in send(), and we don't handle errors proeply around
> > > send(), only around getresponse().
> > > So the error is forgotten by the plugin.
> >
> > That sounds bad?
>
> Indeed, but we never had this issue in real environment.
>
> Basically we should handle any exception in the plugin as failure,
> but I'm not sure where we should fix it.
>
> We can add error handler around all the entry points
> (e.g. pwrite()), setting h['failed'] = True. But this makes it
> harder to write correct plugins; everyone has to ensure correct
> error handling instead of getting it right by default.
In nbdkit-python-plugin, any exception which escapes will be caught in
the C code and turned into an NBD error on the wire. The function
which does this is:
https://github.com/libguestfs/nbdkit/blob/a52ebb4e8d18a73b9ae009cd768bfab...
Since each plugin method corresponds* (* = mostly) to a single NBD
command on the wire, returning an error from the method signals an
error back to the NBD client. If the NBD client is ‘qemu-img convert’
then it will exit the first time it sees an error.
(The retry filter
http://libguestfs.org/nbdkit-retry-filter.1.html
changes this behaviour, but we are not using that for output to
rhv-upload.)
> We can handle this in the python plugin (e.g. h['failed'] = True)
> but the handle is controlled by the plugin, so this does not sound
> like a good way.
The close() method is always called, whether or not there was an
error. The same connection handle (‘h’) is passed back to the close
method. It's quite safe and normal to store state in this handle.
nbdkit itself doesn't save the fact that there was an error during the
connection (it can't since it doesn't know if the error was
recoverable or not). But rhv-upload-plugin needs to handle failed vs
successful transfers differently. Thus we save the failure state in
the handle the first time we see an error.
> Maybe we can change close() to:
>
> def close(failed=True):
> if failed:
> # cleanup after some call failed
> ...
>
> # Cleanup after successful run
>
> The python plugin will remember unhandled errors and will call:
>
> close(failed=True)
>
> To write correct plugin author must not handle fatal errors in the plugin.
This changes how the Python plugin would work versus other plugins,
and the C code can't know if the NBD connection failed, because some
NBD errors are recoverable. Also the client does not send us any
error indication when it disconnects.
The answer is probably to wrap every method in a big try...except
which sets the failed flag in the handle.
This works with current plugin, and can be backported to downstream if needed.
However it makes it harder to write plugins in python. The basic idea is that
not handling an error in python will crash with a traceback *without* having to
do anything. It would be best if the python plugin ensure this.
If we want to support recoverable errors, we can define a special
RecoverableError
exception that signal the python plugin that this request failed, but
the error is not
fatal.
We can also define more specific errors like UnsupportedOperation, for
example if
plugin's zero() fail and the author wants nbdkit to stop calling
zero() and emulate
it with write.
Nir