On 09/10/21 09:48, Richard W.M. Jones wrote:
On Fri, Sep 10, 2021 at 01:06:17AM +0200, Laszlo Ersek wrote:
> There are multiple problems with using strcasecmp() for ordering registry
> keys:
>
> (1) strcasecmp() is influenced by LC_CTYPE.
>
> (2) strcasecmp() cannot implement case conversion for multibyte UTF-8
> sequences.
>
> (3) Even with LC_CTYPE=POSIX and key names consisting solely of ASCII
> characters, strcasecmp() converts characters to lowercase, for
> comparison. But on Windows, the CompareStringOrdinal() function
> converts characters to uppercase. This makes a difference when
> comparing a letter to one of the characters that fall between 'Z'
> (0x5A) and 'a' (0x61), namely {'[', '\\', ']',
'^', '_', '`'}. For
> example,
>
> 'c' (0x63) > '_' (0x5F)
> 'C' (0x43) < '_' (0x5F)
>
> Compare key names byte for byte, eliminating problems (1) and (3).
>
> Bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=1648520
> Signed-off-by: Laszlo Ersek <lersek(a)redhat.com>
> ---
> lib/write.c | 32 +++++++++++++++++++++++++++++++-
> 1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/lib/write.c b/lib/write.c
> index 70105c9d9907..d9a13a3c18b6 100644
> --- a/lib/write.c
> +++ b/lib/write.c
> @@ -462,7 +462,37 @@ compare_name_with_nk_name (hive_h *h, const char *name,
hive_node_h nk_offs)
> return 0;
> }
>
> - int r = strcasecmp (name, nname);
> + /* Perform a limited case-insensitive comparison. ASCII letters will be
> + * *upper-cased*. Multibyte sequences will produce nonsensical orderings.
> + */
> + int r = 0;
> + const char *s1 = name;
> + const char *s2 = nname;
> +
> + for (;;) {
> + unsigned char c1 = *(s1++);
> + unsigned char c2 = *(s2++);
> +
> + if (c1 >= 'a' && c1 <= 'z')
> + c1 = 'A' + (c1 - 'a');
> + if (c2 >= 'a' && c2 <= 'z')
> + c2 = 'A' + (c2 - 'a');
> + if (c1 < c2) {
> + /* Also covers the case when "name" is a prefix of
"nname". */
> + r = -1;
> + break;
> + }
> + if (c1 > c2) {
> + /* Also covers the case when "nname" is a prefix of
"name". */
> + r = 1;
> + break;
> + }
> + if (c1 == '\0') {
> + /* Both strings end. */
> + break;
> + }
> + }
> +
> free (nname);
>
> return r;
Thanks for the detailed analysis on the BZ.
ACK - since it's an incremental improvement over what we have now and
fixes the bug.
There may be registries with multibyte keys (nothing surprises me
about the Windows registry), but as this only affects the ability to
insert new keys into a node that has such keys, the problem that we
don't handle those is limited in practice.