Thanks for looking into this Rich and Daniel, and thanks for the
profound
explanation.
See my answers below.
`'i' is not used inside the loop? Or is this error intermittent?` - 'i'
can be
removed, the error is intermittent.
This reproduces on libguestfs 1.46.2 and 1.48.6.
`This is specifically a Python API problem or would it affect
the C API too?` - I assume it's not in the C API, since this issue doesn't
reproduce when using guestfish.
I would likely need a complete self-contained reproducer to really go
anywhere with this.` - I can supply a container image with libguestfs 1.46.2
running the for loop. Let me know if that helps.
Could you come up with a Python program, or a Python + guestfish
program. Something like ...
$ guestfish -N fs -m /dev/sda1 touch '/xxxó'
+ something in Python that uses the image (test1.img) to
reproduce it?
Rich.
Also, as a workaround, I avoided calling the event callback if null
is returned
by Py_BuildValue (still print the error and release the thread).
This seems to work for our usage (we only use the event callbacks for logging),
any potential issues I'm missing?
On Mon, Feb 13, 2023 at 8:14 PM Daniel P. Berrangé <berrange(a)redhat.com> wrote:
On Mon, Feb 13, 2023 at 06:07:58PM +0000, Richard W.M. Jones wrote:
> On Sun, Feb 12, 2023 at 03:31:08PM +0200, Yonatan Shtarkman wrote:
> > Hey,
> > When downloading a file whose path contains multi-byte utf-8,
libguestfs
> > sometimes crashes.
> > This reproduces when using python, and not when using guestfish.
> >
> > Code to reproduce:
> > for i in range(2000):
> > g.download ('/xxxó', '/tmp/1')
>
> 'i' is not used inside the loop? Or is this error intermittent?
>
> > #0 raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/
raise.c:50
> > #1 0x00007ffff7fac140 in <signal handler called> () at /lib/
x86_64-linux-gnu/
> > libpthread.so.0
> > #2 0x00007ffff6f77701 in _Py_INCREF (op=<optimized out>) at /usr/
include/
> > python3.9/object.h:408
> > #3 guestfs_int_py_event_callback_wrapper
> > (g=<optimized out>, flags=<optimized out>, array_len=0,
array=0x0,
buf_len=
> > 47, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm --debug settle -E \
303by",
> > event_handle=0, event=16, callback=0x7ffff2516600) at handle.c:137
> > #4 guestfs_int_py_event_callback_wrapper
> > (g=<optimized out>, callback=0x7ffff2516600, event=16,
event_handle
=0,
> > flags=<optimized out>, buf=0x113b8a0 "gs=0x0\r\ncommandrvf:
udevadm
--debug
> > settle -E \303by", buf_len=47, array=0x0, array_len=0) at
handle.c:104
> > #5 0x00007ffff6e0076a in guestfs_int_call_callbacks_message (g=
0xf31290, event
> > =16, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm --debug settle -E \
303by",
> > buf_len=47)
> > at events.c:117
> > #6 0x00007ffff6e1702e in guestfs_int_log_message_callback
> > (g=g@entry=0xf31290, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm
--debug
> > settle -E \303by", len=len@entry=47) at proto.c:145
> > #7 0x00007ffff6dfb759 in handle_log_message (g=g@entry=0xf31290, conn=
> > conn@entry=0x110e280) at conn-socket.c:395
> > #8 0x00007ffff6dfbd63 in read_data (len=4, bufv=<optimized out>,
connv
=
> > <optimized out>, g=<optimized out>) at conn-socket.c:179
> > #9 read_data (g=0xf31290, connv=0x110e280, bufv=<optimized out>,
len=
4) at
> > conn-socket.c:142
> > #10 0x00007ffff6e1764a in recv_from_daemon (buf_rtn=0x7fffffffd858,
size_rtn=
> > 0x7fffffffd854, g=0xf31290) at proto.c:545
> > #11 guestfs_int_recv_from_daemon (g=g@entry=0xf31290, size_rtn=
size_rtn@entry=
> > 0x7fffffffd854, buf_rtn=buf_rtn@entry=0x7fffffffd858) at proto.c:623
> > #12 0x00007ffff6e17a5a in guestfs_int_recv
> > (g=g@entry=0xf31290, fn=fn@entry=0x7ffff6e3b3e8 "download",
hdr=
hdr@entry=
> > 0x7fffffffd920, err=err@entry=0x7fffffffd8f0, xdrp=xdrp@entry=0x0, ret=
> > ret@entry=0x0)
> > at proto.c:668
> >
> > I debugged this issue and noticed that the appliance logs
from commandrvf are
> > truncated, leading to parse failure (missing utf-8 additional bytes):
> >
https://github.com/libguestfs/libguestfs/blob/master/python/handle.c#
L134
> > UnicodeDecodeError: 'utf-8' codec can't decode byte 0x84 in
position 0:
invalid
> > start byte
>
> So I thought we'd fixed this in:
>
>
https://github.com/libguestfs/libguestfs/commit/
0ee02e0117527b86a31b2a88a14994ce7f15571f
>
> This is specifically a Python API problem or would it affect
> the C API too?
The difference with any C API is that almost nothing at the C level will
be validating that the bytes are actually valid utf-8 sequences.
So the truncated data is unlikely to result in a fatal error. Python is
aggressively validating all bytes, and so you get a hard error from the
truncated UTF-8. Other languages may vary, but I've not seen anything
that makes validation errors a failure in quite such an aggressive way
as python. The problems with decode exceptions have hit soooo many apps
using python over the past few years. Even worse if running in a C
locale as python will reject anything with 8th bit set as being outside
7-bit asciii, instead of being 8-bit clean in its stream handling.
With regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange
:|
|:
https://libvirt.org -o-
https://fstop138.berrange.com
:|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange
:|
virt-p2v converts physical machines to virtual machines. Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.