On Mon, Aug 11, 2014 at 01:11:54AM +0200, Peter Wu wrote:
Would there be interest for inclusion of such an API in hivex? Since
it uses
the existing Python methods, breakage must not be possible unless you break
other programs relying on it.
It really depends on the details, but I suspect what is just a second
API is only going to cause confusion. However you are certainly
welcome to maintain a nicer hivex API outside, and we can even point
to it in the documentation.
> It's worth saying that encoding in the registry itself is
not always
> UTF-16LE. It's sometimes UTF-8, ASCII or (in a case I found last
> week) an NLS like ISO-8859-1 or Big5. Essentially the consuming app
> always has to know what encoding to use. Doing "clever" stuff in the
> bindings is therefore almost always going to be wrong in some case.
> (This is also why the C functions like hivex_value_string are
> deprecated).
When doing a registry export (.reg), all strings like "Key"="Value"
appears to be UTF-16 strings. Trying to push an UTF-8 string into
the registry results in Chinese characters (UTF-16?). Could you
confirm/reject this against the exports of your keys? Also, when the
trailing NUL byte is missing in the services values, a BSOD can be
observed.
Well it depends on the OS you are using. Try it with Windows XP which
uses (some of the time) an old CodePage or ISO-8859-X encoding for at
least registry key names. Also it depends on the Windows application
that is reading the registry.
See:
http://git.annexia.org/?p=hivex-test-data.git;a=commit;h=2145ff5774ecbd4c...
http://git.annexia.org/?p=hivex-test-data.git;a=commit;h=e296fba552f57c63...
If it is necessary to support other encodings, it may be worth to add
a
function to wrap the encoding, (type?) and value:
UTF_16_LE = "utf-16-le"
class RegistryString(object):
def __init__(self, type, value, encoding=UTF_16_LE):
...
def value(self):
return self.value.encode(self.encoding) + u"\0".encode(self.encoding)
(maybe introduce a wrapper function for this to avoid long lines)
Right, this can be made easier to use in the common case, but it'll
break in other cases.
The fundamental problem here is the Registry format is not
well-specified. Consumers can put whatever junk their developers felt
like at the time. Consider it to be a store of arbitrary binary
strings, which sometimes happen to have a well-known encoding.
Strings are always NUL-terminated, right? I recall reading something
like that in the MSDN documentation.
Yup, in well-formed values, but I bet you can find places where that
is not the case.
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/