On Friday, 2 November 2018 15:23:07 CET Richard W.M. Jones wrote:
We only use xmlParseURI to parse our own "homebrew" URIs,
for example
the ones used by guestfish --add or virt-v2v. Unfortunately
xmlParseURI cannot handle URIs with spaces or other non-RFC-compliant
characters so simple commands like these fail:
$ guestfish -a 'ssh://example.com/virtual machine.img'
guestfish: --add: could not parse URI 'ssh://example.com/virtual machine.img'
$ guestfish -a 'ssh://example.com/バーチャルマシン.img'
guestfish: --add: could not parse URI 'ssh://example.com/バーチャルマシン.img'
This is a usability problem. However since these are not expected to
be generic RFC-compliant URIs we can perform the required
percent-escaping ourselves instead of demanding that the user does
this.
Note that the wrapper function should not be used on real URLs or
libvirt URLs.
---
I do not think this is a good idea at all.
First of all, converting the URI to UTF-8 is a bad idea, since that that
is not the encoding of the URI. Second, it does a search&replace on
the whole string, just skipping some characters however not considering
the various parts of an URI. Also, this will break well-formed URIs
that use e.g. Punycode.
In the end, users must provide compliant URIs anyway, so letting them
always do the proper job seems the better option to me. Yes, I know it
is not the best option for users manually invoking the tools, but
certainly less problematic than dealing with all the possible issues
of partially-encoded-URIs. This is also explained by the ycombinator
link mentioned in a comment, and how this is a mess in e.g. modern web
browsers.
Let's not get into this mess, and just stay with the simple, and
effective solution: always require compliant URIs.
--
Pino Toscano