On Mon, Jul 6, 2020 at 2:39 PM Richard W.M. Jones <rjones(a)redhat.com> wrote:
Hi Sam,
I was doing some work on the Python bindings, starting with removing
support for Python 2 since it's EOL. I thought I would have a look at
this patch.
This is great, I'm currently working on adding python3 type hints to
the auto-generated functions, due to libguestfs's nature this is
easily possible.
(However a bit tricky if we want to preserve the 79 chars per line and
correct tabbing)
So firstly I think the last version posted is:
https://www.redhat.com/archives/libguestfs/2020-April/msg00190.html
My impression of this is that we shouldn't just hack the Python
bindings to make this apparently work. But I wanted to ask you a few
questions about this:
- Does the SUSE RPM output contain a mix of encodings? Or is
it all latin-1 or utf-8?
Not so sure about that, only the packages mentioned in the commit
message were the ones that failed that utf8 conversion function - Not
sure if all the other packages are latin1 encoded but succeed to be
utf-8 decoded.
- Is there any indication of the correct encoding from RPM?
As far as I know it must always be utf-8
- Can we not instead escape the bad sequences using whatever is the
C-level equivalent of str.encode(..., 'backslashreplace')?
Or I guess better, escape them as Unicode compatibility characters
https://en.wikipedia.org/wiki/Unicode_compatibility_characters
The v2 of my patch which was reviewed by Daniel does exactly that, I used:
return PyUnicode_Decode(str, size, "utf-8", "replace");
However see Nir's comment
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/
Thanks!