On Sunday, 26 April 2020 20:14:03 CEST Sam Eiderman wrote:
The python3 bindings create PyUnicode objects from application
strings
on the guest (i.e. installed rpm, deb packages).
It is documented that rpm package fields such as description should be
utf8 encoded - however in some cases they are not a valid unicode
string, on SLES11 SP4 the encoding of the description of the following
packages is latin1 and they fail to be converted to unicode using
guestfs_int_py_fromstring() (which invokes PyUnicode_FromString()):
Sorry, I wanted to reach our resident Python maintainers to get their
feedback, and so far had no time for it. Will do it shortly.
BTW do you have a reproducer I can actually try freely?
diff --git a/python/handle.c b/python/handle.c
index 2fb8c18f0..fe89dc58a 100644
--- a/python/handle.c
+++ b/python/handle.c
@@ -387,7 +387,7 @@ guestfs_int_py_fromstring (const char *str)
#if PY_MAJOR_VERSION < 3
return PyString_FromString (str);
#else
- return PyUnicode_FromString (str);
+ return guestfs_int_py_fromstringsize (str, strlen (str));
#endif
}
@@ -397,7 +397,12 @@ guestfs_int_py_fromstringsize (const char *str, size_t size)
#if PY_MAJOR_VERSION < 3
return PyString_FromStringAndSize (str, size);
#else
- return PyUnicode_FromStringAndSize (str, size);
+ PyObject *s = PyUnicode_FromString (str);
+ if (s == NULL) {
+ PyErr_Clear ();
+ s = PyUnicode_Decode (str, strlen(str), "latin1", "strict");
Minor nit: space between "strlen" and the opening bracket.
Also, isn't there any error we can check as a way to detect this
situation, rather than always attempting to decode it as latin1?
Thanks,
--
Pino Toscano