Re: [Libguestfs] [PATCH 3/3] lib: Add support for creating nodes (keys) and values with UTF-16LE-encoded names

Monday, 25 November 2013

On Mon, Nov 25, 2013 at 10:52:30PM +0100, Hilko Bengen wrote:
...
 * Richard W.M. Jones:

 >> -  nk->name_len = htole16 (strlen (name));
 >> -  strcpy (nk->name, name);
 >> +  nk->name_len = htole16 (recoded_name_len);
 >> +  memcpy (nk->name, recoded_name, recoded_name_len);
 >> +  free(recoded_name);
 >
 > Please put spaces after function names!  It improves readability:

 Sorry, I'll fix those. I also forgot to add a free() in
 hivex_node_set_values.

 >>    /* Update max_subkey_name_len in parent nk. */
 >> -  uint16_t max = le16toh (parent_nk->max_subkey_name_len);
 >> -  if (max < strlen (name) * 2)  /* *2 because "recoded" in
UTF16-LE. */
 >> -    parent_nk->max_subkey_name_len = htole16 (strlen (name) * 2);
 >> +  size_t utf16_len = use_utf16 ? recoded_name_len : recoded_name_len * 2;
 >
 > * 2 is probably wrong here for non-BMP characters, but the original
 > code makes the same mistake ...  Could we get the true length from the
 > hivex_encode_string function? 
[Note: I read this email after my reply to version 2 of this patch]

...
 Are there any non-BMP characters that can be encoded in Latin1 -- or
 whatever 1-byte encoding one is supposed to use there? 
OK I guess the original code is correct.

...
 Peter Norris' master's thesis[1] suggests that

     recoded_name_len : recoded_name_len * 2

 is probably right. 
However I still think *2 is incorrect, despite what the thesis says.
(The thesis is -- how shall I put this -- "unclear" in the way he uses
the word "Unicode", never mentioning "UTF" at all).

For example, the encoding of U13057 (a rather elegant Egyptian
hieroglyph 𓁗, if you can find a font that can render it) is
{ 0xD80C, 0xDC57 } (4 bytes) in UTF-16.

However, I also doubt that Windows works correctly with non-BMP
UTF-16.  ie. Windows probably means UCS-2.

...
 Cheers,
 -Hilko

 [1]
http://amnesia.gtisc.gatech.edu/~moyix/suzibandit.ltd.uk/MSc/Registry%20S...,
p.79 
Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming blog: http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] [PATCH 3/3] lib: Add support for creating nodes (keys) and values with UTF-16LE-encoded names