The python3 bindings create unicode objects from application strings
on the guest (i.e. installed rpm, deb packages).
It is documented that rpm package fields such as description should be
utf8 encoded - however in some cases they are not a valid unicode
string, on SLES11 SP4 the following packages fail to be converted to
unicode using guestfs_int_py_fromstring() (which invokes
PyUnicode_FromString()):
PackageKit
aaa_base
coreutils
dejavu
desktop-data-SLED
gnome-utils
hunspell
hunspell-32bit
hunspell-tools
libblocxx6
libexif
libgphoto2
libgtksourceview-2_0-0
libmpfr1
libopensc2
libopensc2-32bit
liborc-0_4-0
libpackagekit-glib10
libpixman-1-0
libpixman-1-0-32bit
libpoppler-glib4
libpoppler5
libsensors3
libtelepathy-glib0
m4
opensc
opensc-32bit
permissions
pinentry
poppler-tools
python-gtksourceview
splashy
syslog-ng
tar
tightvnc
xorg-x11
xorg-x11-xauth
yast2-mouse
Fix this by globally changing guestfs_int_py_fromstring()
and guestfs_int_py_fromstringsize() to decode utf-8 with the "replace"
error handler:
https://docs.python.org/3/library/codecs.html#error-handlers
For example, this will decode PackageKit's description on SLES4 the
following way:
Backend: pisi
S.�ağlar Onur <caglar(a)pardus.org.tr>
Signed-off-by: Sam Eiderman <sameid(a)google.com>
---
python/handle.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/python/handle.c b/python/handle.c
index 2fb8c18f0..427424707 100644
--- a/python/handle.c
+++ b/python/handle.c
@@ -387,7 +387,7 @@ guestfs_int_py_fromstring (const char *str)
#if PY_MAJOR_VERSION < 3
return PyString_FromString (str);
#else
- return PyUnicode_FromString (str);
+ return PyUnicode_Decode(str, strlen(str), "utf-8", "replace");
#endif
}
@@ -397,7 +397,7 @@ guestfs_int_py_fromstringsize (const char *str, size_t size)
#if PY_MAJOR_VERSION < 3
return PyString_FromStringAndSize (str, size);
#else
- return PyUnicode_FromStringAndSize (str, size);
+ return PyUnicode_Decode(str, size, "utf-8", "replace");
#endif
}
--
2.26.1.301.g55bc3eb7cb9-goog