I uploaded a v2, which does as you requested, more globally (across all python bindings) - tell me what you think.

On Mon, Apr 20, 2020 at 2:42 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Mon, Apr 20, 2020 at 01:17:35PM +0300, Sam Eiderman wrote:
> The python3 bindings create unicode objects from application strings
> on the guest (i.e. installed rpm, deb packages).
> It is documented that rpm package fields such as description should be
> utf8 encoded - however in some cases they are not a valid unicode
> string, on SLES11 SP4 the following packages fail to be converted to
> unicode using guestfs_int_py_fromstring() (which invokes
> PyUnicode_FromString()):
>
>  PackageKit
>  aaa_base
>  coreutils
>  dejavu
>  desktop-data-SLED
>  gnome-utils
>  hunspell
>  hunspell-32bit
>  hunspell-tools
>  libblocxx6
>  libexif
>  libgphoto2
>  libgtksourceview-2_0-0
>  libmpfr1
>  libopensc2
>  libopensc2-32bit
>  liborc-0_4-0
>  libpackagekit-glib10
>  libpixman-1-0
>  libpixman-1-0-32bit
>  libpoppler-glib4
>  libpoppler5
>  libsensors3
>  libtelepathy-glib0
>  m4
>  opensc
>  opensc-32bit
>  permissions
>  pinentry
>  poppler-tools
>  python-gtksourceview
>  splashy
>  syslog-ng
>  tar
>  tightvnc
>  xorg-x11
>  xorg-x11-xauth
>  yast2-mouse
>
> This is a surgical fix for inspect_list_applications2()'s description
> field.
>
> Signed-off-by: Sam Eiderman <sameid@google.com>
> ---
>  generator/python.ml | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/generator/python.ml b/generator/python.ml
> index f0d6b5d96..7394a943a 100644
> --- a/generator/python.ml
> +++ b/generator/python.ml
> @@ -170,6 +170,14 @@ and generate_python_structs () =
>          function
>          | name, FString ->
>              pr "  value = guestfs_int_py_fromstring (%s->%s);\n" typ name;
> +            (match typ, name with
> +            | "application", "app_description"
> +            | "application2", "app2_description" ->
> +                pr "  if (value == NULL) {\n";
> +                pr "    value = guestfs_int_py_fromstring (\"\");\n";
> +                pr "    PyErr_Clear ();\n";
> +                pr "  }\n";

I don't think this is especially friendly/helpful to users.

I'm assuming that there's just a handful of characters that are not
valid UTF-8. I think we really want a graceful conversion that will
convert as much as possible, replacing any invalid UTF-8 with some
generic placeholder character.

Regards,
Daniel
--
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|