On Mon, Nov 25, 2013 at 10:52:30PM +0100, Hilko Bengen wrote:
* Richard W.M. Jones:
>> - nk->name_len = htole16 (strlen (name));
>> - strcpy (nk->name, name);
>> + nk->name_len = htole16 (recoded_name_len);
>> + memcpy (nk->name, recoded_name, recoded_name_len);
>> + free(recoded_name);
>
> Please put spaces after function names! It improves readability:
Sorry, I'll fix those. I also forgot to add a free() in
hivex_node_set_values.
>> /* Update max_subkey_name_len in parent nk. */
>> - uint16_t max = le16toh (parent_nk->max_subkey_name_len);
>> - if (max < strlen (name) * 2) /* *2 because "recoded" in
UTF16-LE. */
>> - parent_nk->max_subkey_name_len = htole16 (strlen (name) * 2);
>> + size_t utf16_len = use_utf16 ? recoded_name_len : recoded_name_len * 2;
>
> * 2 is probably wrong here for non-BMP characters, but the original
> code makes the same mistake ... Could we get the true length from the
> hivex_encode_string function?
[Note: I read this email after my reply to version 2 of this patch]
Are there any non-BMP characters that can be encoded in Latin1 -- or
whatever 1-byte encoding one is supposed to use there?
OK I guess the original code is correct.
Peter Norris' master's thesis[1] suggests that
recoded_name_len : recoded_name_len * 2
is probably right.
However I still think *2 is incorrect, despite what the thesis says.
(The thesis is -- how shall I put this -- "unclear" in the way he uses
the word "Unicode", never mentioning "UTF" at all).
For example, the encoding of U13057 (a rather elegant Egyptian
hieroglyph 𓁗, if you can find a font that can render it) is
{ 0xD80C, 0xDC57 } (4 bytes) in UTF-16.
However, I also doubt that Windows works correctly with non-BMP
UTF-16. ie. Windows probably means UCS-2.
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming blog:
http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)