On Wed, May 13, 2020 at 10:06 PM Richard W.M. Jones <rjones@redhat.com> wrote:
>
> On Sun, Apr 26, 2020 at 09:14:03PM +0300, Sam Eiderman wrote:
> > The python3 bindings create PyUnicode objects from application strings
> > on the guest (i.e. installed rpm, deb packages).
> > It is documented that rpm package fields such as description should be
> > utf8 encoded - however in some cases they are not a valid unicode
> > string, on SLES11 SP4 the encoding of the description of the following
> > packages is latin1 and they fail to be converted to unicode using
> > guestfs_int_py_fromstring() (which invokes PyUnicode_FromString()):
> >
> > PackageKit
> > aaa_base
> > coreutils
> > dejavu
> > desktop-data-SLED
> > gnome-utils
> > hunspell
> > hunspell-32bit
> > hunspell-tools
> > libblocxx6
> > libexif
> > libgphoto2
> > libgtksourceview-2_0-0
> > libmpfr1
> > libopensc2
> > libopensc2-32bit
> > liborc-0_4-0
> > libpackagekit-glib10
> > libpixman-1-0
> > libpixman-1-0-32bit
> > libpoppler-glib4
> > libpoppler5
> > libsensors3
> > libtelepathy-glib0
> > m4
> > opensc
> > opensc-32bit
> > permissions
> > pinentry
> > poppler-tools
> > python-gtksourceview
> > splashy
> > syslog-ng
> > tar
> > tightvnc
> > xorg-x11
> > xorg-x11-xauth
> > yast2-mouse
> >
> > Fix this by globally changing guestfs_int_py_fromstring()
> > and guestfs_int_py_fromstringsize() to fallback to latin1 decoding if
> > utf-8 decoding fails.
> >
> > Using the "strict" error handler doesn't matter in the case of latin1
> > and has the same effect of "replace":
> >
> > https://docs.python.org/3/library/codecs.html#error-handlers
> >
> > Signed-off-by: Sam Eiderman <sameid@google.com>
> > ---
> > python/handle.c | 9 +++++++--
> > 1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/python/handle.c b/python/handle.c
> > index 2fb8c18f0..fe89dc58a 100644
> > --- a/python/handle.c
> > +++ b/python/handle.c
> > @@ -387,7 +387,7 @@ guestfs_int_py_fromstring (const char *str)
> > #if PY_MAJOR_VERSION < 3
> > return PyString_FromString (str);
> > #else
> > - return PyUnicode_FromString (str);
> > + return guestfs_int_py_fromstringsize (str, strlen (str));
> > #endif
> > }
> >
> > @@ -397,7 +397,12 @@ guestfs_int_py_fromstringsize (const char *str, size_t size)
> > #if PY_MAJOR_VERSION < 3
> > return PyString_FromStringAndSize (str, size);
> > #else
> > - return PyUnicode_FromStringAndSize (str, size);
> > + PyObject *s = PyUnicode_FromString (str);
> > + if (s == NULL) {
> > + PyErr_Clear ();
> > + s = PyUnicode_Decode (str, strlen(str), "latin1", "strict");
> > + }
> > + return s;
> > #endif
> > }
>
> Looks OK to me. Pino - any objections to merging this?
>
> Rich.
>
> --
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> virt-df lists disk usage of guests without needing to install any
> software inside the virtual machine. Supports Linux and Windows.
> http://people.redhat.com/~rjones/virt-df/
>