gentle ping

On Wed, Jun 3, 2020 at 2:52 PM Sam Eiderman <sameid@google.com> wrote:
On Wed, May 13, 2020 at 10:06 PM Richard W.M. Jones <rjones@redhat.com> wrote:
>
> On Sun, Apr 26, 2020 at 09:14:03PM +0300, Sam Eiderman wrote:
> > The python3 bindings create PyUnicode objects from application strings
> > on the guest (i.e. installed rpm, deb packages).
> > It is documented that rpm package fields such as description should be
> > utf8 encoded - however in some cases they are not a valid unicode
> > string, on SLES11 SP4 the encoding of the description of the following
> > packages is latin1 and they fail to be converted to unicode using
> > guestfs_int_py_fromstring() (which invokes PyUnicode_FromString()):
> >
> >  PackageKit
> >  aaa_base
> >  coreutils
> >  dejavu
> >  desktop-data-SLED
> >  gnome-utils
> >  hunspell
> >  hunspell-32bit
> >  hunspell-tools
> >  libblocxx6
> >  libexif
> >  libgphoto2
> >  libgtksourceview-2_0-0
> >  libmpfr1
> >  libopensc2
> >  libopensc2-32bit
> >  liborc-0_4-0
> >  libpackagekit-glib10
> >  libpixman-1-0
> >  libpixman-1-0-32bit
> >  libpoppler-glib4
> >  libpoppler5
> >  libsensors3
> >  libtelepathy-glib0
> >  m4
> >  opensc
> >  opensc-32bit
> >  permissions
> >  pinentry
> >  poppler-tools
> >  python-gtksourceview
> >  splashy
> >  syslog-ng
> >  tar
> >  tightvnc
> >  xorg-x11
> >  xorg-x11-xauth
> >  yast2-mouse
> >
> > Fix this by globally changing guestfs_int_py_fromstring()
> > and guestfs_int_py_fromstringsize() to fallback to latin1 decoding if
> > utf-8 decoding fails.
> >
> > Using the "strict" error handler doesn't matter in the case of latin1
> > and has the same effect of "replace":
> >
> >  https://docs.python.org/3/library/codecs.html#error-handlers
> >
> > Signed-off-by: Sam Eiderman <sameid@google.com>
> > ---
> >  python/handle.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/python/handle.c b/python/handle.c
> > index 2fb8c18f0..fe89dc58a 100644
> > --- a/python/handle.c
> > +++ b/python/handle.c
> > @@ -387,7 +387,7 @@ guestfs_int_py_fromstring (const char *str)
> >  #if PY_MAJOR_VERSION < 3
> >    return PyString_FromString (str);
> >  #else
> > -  return PyUnicode_FromString (str);
> > +  return guestfs_int_py_fromstringsize (str, strlen (str));
> >  #endif
> >  }
> >
> > @@ -397,7 +397,12 @@ guestfs_int_py_fromstringsize (const char *str, size_t size)
> >  #if PY_MAJOR_VERSION < 3
> >    return PyString_FromStringAndSize (str, size);
> >  #else
> > -  return PyUnicode_FromStringAndSize (str, size);
> > +  PyObject *s = PyUnicode_FromString (str);
> > +  if (s == NULL) {
> > +    PyErr_Clear ();
> > +    s = PyUnicode_Decode (str, strlen(str), "latin1", "strict");
> > +  }
> > +  return s;
> >  #endif
> >  }
>
> Looks OK to me.  Pino - any objections to merging this?
>
> Rich.
>
> --
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> virt-df lists disk usage of guests without needing to install any
> software inside the virtual machine.  Supports Linux and Windows.
> http://people.redhat.com/~rjones/virt-df/
>