On Mon, Feb 13, 2023 at 06:07:58PM +0000, Richard W.M. Jones wrote:
On Sun, Feb 12, 2023 at 03:31:08PM +0200, Yonatan Shtarkman wrote:
> Hey,
> When downloading a file whose path contains multi-byte utf-8, libguestfs
> sometimes crashes.
> This reproduces when using python, and not when using guestfish.
>
> Code to reproduce:
> for i in range(2000):
> g.download ('/xxxó', '/tmp/1')
'i' is not used inside the loop? Or is this error intermittent?
> #0 raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1 0x00007ffff7fac140 in <signal handler called> () at
/lib/x86_64-linux-gnu/
> libpthread.so.0
> #2 0x00007ffff6f77701 in _Py_INCREF (op=<optimized out>) at /usr/include/
> python3.9/object.h:408
> #3 guestfs_int_py_event_callback_wrapper
> (g=<optimized out>, flags=<optimized out>, array_len=0, array=0x0,
buf_len=
> 47, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm --debug settle -E
\303by",
> event_handle=0, event=16, callback=0x7ffff2516600) at handle.c:137
> #4 guestfs_int_py_event_callback_wrapper
> (g=<optimized out>, callback=0x7ffff2516600, event=16, event_handle=0,
> flags=<optimized out>, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm
--debug
> settle -E \303by", buf_len=47, array=0x0, array_len=0) at handle.c:104
> #5 0x00007ffff6e0076a in guestfs_int_call_callbacks_message (g=0xf31290, event
> =16, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm --debug settle -E
\303by",
> buf_len=47)
> at events.c:117
> #6 0x00007ffff6e1702e in guestfs_int_log_message_callback
> (g=g@entry=0xf31290, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm --debug
> settle -E \303by", len=len@entry=47) at proto.c:145
> #7 0x00007ffff6dfb759 in handle_log_message (g=g@entry=0xf31290, conn=
> conn@entry=0x110e280) at conn-socket.c:395
> #8 0x00007ffff6dfbd63 in read_data (len=4, bufv=<optimized out>, connv=
> <optimized out>, g=<optimized out>) at conn-socket.c:179
> #9 read_data (g=0xf31290, connv=0x110e280, bufv=<optimized out>, len=4) at
> conn-socket.c:142
> #10 0x00007ffff6e1764a in recv_from_daemon (buf_rtn=0x7fffffffd858, size_rtn=
> 0x7fffffffd854, g=0xf31290) at proto.c:545
> #11 guestfs_int_recv_from_daemon (g=g@entry=0xf31290, size_rtn=size_rtn@entry=
> 0x7fffffffd854, buf_rtn=buf_rtn@entry=0x7fffffffd858) at proto.c:623
> #12 0x00007ffff6e17a5a in guestfs_int_recv
> (g=g@entry=0xf31290, fn=fn@entry=0x7ffff6e3b3e8 "download",
hdr=hdr@entry=
> 0x7fffffffd920, err=err@entry=0x7fffffffd8f0, xdrp=xdrp@entry=0x0, ret=
> ret@entry=0x0)
> at proto.c:668
>
> I debugged this issue and noticed that the appliance logs from commandrvf are
> truncated, leading to parse failure (missing utf-8 additional bytes):
>
https://github.com/libguestfs/libguestfs/blob/master/python/handle.c#L134
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0x84 in position 0:
invalid
> start byte
So I thought we'd fixed this in:
https://github.com/libguestfs/libguestfs/commit/0ee02e0117527b86a31b2a88a...
This is specifically a Python API problem or would it affect
the C API too?
The difference with any C API is that almost nothing at the C level will
be validating that the bytes are actually valid utf-8 sequences.
So the truncated data is unlikely to result in a fatal error. Python is
aggressively validating all bytes, and so you get a hard error from the
truncated UTF-8. Other languages may vary, but I've not seen anything
that makes validation errors a failure in quite such an aggressive way
as python. The problems with decode exceptions have hit soooo many apps
using python over the past few years. Even worse if running in a C
locale as python will reject anything with 8th bit set as being outside
7-bit asciii, instead of being 8-bit clean in its stream handling.
With regards,
Daniel
--
|: