On Wed, May 13, 2020 at 10:06 PM Richard W.M. Jones <rjones(a)redhat.com> wrote:
On Sun, Apr 26, 2020 at 09:14:03PM +0300, Sam Eiderman wrote:
> The python3 bindings create PyUnicode objects from application strings
> on the guest (i.e. installed rpm, deb packages).
> It is documented that rpm package fields such as description should be
> utf8 encoded - however in some cases they are not a valid unicode
> string, on SLES11 SP4 the encoding of the description of the following
> packages is latin1 and they fail to be converted to unicode using
> guestfs_int_py_fromstring() (which invokes PyUnicode_FromString()):
>
> PackageKit
> aaa_base
> coreutils
> dejavu
> desktop-data-SLED
> gnome-utils
> hunspell
> hunspell-32bit
> hunspell-tools
> libblocxx6
> libexif
> libgphoto2
> libgtksourceview-2_0-0
> libmpfr1
> libopensc2
> libopensc2-32bit
> liborc-0_4-0
> libpackagekit-glib10
> libpixman-1-0
> libpixman-1-0-32bit
> libpoppler-glib4
> libpoppler5
> libsensors3
> libtelepathy-glib0
> m4
> opensc
> opensc-32bit
> permissions
> pinentry
> poppler-tools
> python-gtksourceview
> splashy
> syslog-ng
> tar
> tightvnc
> xorg-x11
> xorg-x11-xauth
> yast2-mouse
>
> Fix this by globally changing guestfs_int_py_fromstring()
> and guestfs_int_py_fromstringsize() to fallback to latin1 decoding if
> utf-8 decoding fails.
>
> Using the "strict" error handler doesn't matter in the case of latin1
> and has the same effect of "replace":
>
>
https://docs.python.org/3/library/codecs.html#error-handlers
>
> Signed-off-by: Sam Eiderman <sameid(a)google.com>
> ---
> python/handle.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/python/handle.c b/python/handle.c
> index 2fb8c18f0..fe89dc58a 100644
> --- a/python/handle.c
> +++ b/python/handle.c
> @@ -387,7 +387,7 @@ guestfs_int_py_fromstring (const char *str)
> #if PY_MAJOR_VERSION < 3
> return PyString_FromString (str);
> #else
> - return PyUnicode_FromString (str);
> + return guestfs_int_py_fromstringsize (str, strlen (str));
> #endif
> }
>
> @@ -397,7 +397,12 @@ guestfs_int_py_fromstringsize (const char *str, size_t size)
> #if PY_MAJOR_VERSION < 3
> return PyString_FromStringAndSize (str, size);
> #else
> - return PyUnicode_FromStringAndSize (str, size);
> + PyObject *s = PyUnicode_FromString (str);
> + if (s == NULL) {
> + PyErr_Clear ();
> + s = PyUnicode_Decode (str, strlen(str), "latin1",
"strict");
> + }
> + return s;
> #endif
> }
Looks OK to me. Pino - any objections to merging this?
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/