On Sunday 10 August 2014 22:18:04 Richard W.M. Jones wrote:
On Sun, Aug 10, 2014 at 10:19:39PM +0200, Peter Wu wrote:
> The Python documentation is scare on the type of the various parameters
> and
> return values. Moreover, it states
>
> "Read the hivex(3) man page to find out how to use the API."
>
> Perhaps a second API should be created that is more pythonic (read:
> easier to use)?
One (negative) thing we learned doing libvirt is that unless you
generate the language bindings and C API together, the language
bindings inevitably get out of date, or (worse) contain non-systematic
errors which are difficult to discover and correct.
Therefore you're welcome to create a more Pythonic hivex API either on
top of the existing Python API or talking directly to C, but we
couldn't accept it upstream (well, unless it was fully generated and
included in generator.ml, but that seems unlikely to be possible).
I was thinking of basing the more Pythonic API on top of the current
hivex.Hivex class, not adding more functionality to that wrapper. If someone
would like to create a broken registry, (s)he then has the full power with the
low-level API. If on the other hand one is looking for a way to access a
registry without breaking, a nicer API would be nice. Something that prevents
a programmer from writing 1 byte to a DWORD type for example. Something that
makes traversing through registry keys easier (as demonstrated before).
Would there be interest for inclusion of such an API in hivex? Since it uses
the existing Python methods, breakage must not be possible unless you break
other programs relying on it.
Having said that ...
> hive = hivex.Hivex2("system", write=True)
> ccs_name = "ControlSet001"
> svc_viostor = hive.root()[ccs_name].Services.viostor
>
> if svc_viostor.Start != 4:
> # Automatically detect that int '4' is an DWORD
> svc_viostor.Start = 4
>
> svc.commit()
... a possible exception would be if it just involves adding some
extra code to the existing hivex.py file, eg. adding a just the extra
classes with __setattr__ and __getattr__ functions.
Yes, the low-level binding is left intact, it's just a new Hivex2 class that
is being added. No more changes are needed in libhivexmod.
> In the current implementation, Python 3 bytes (Python 2 strings)
are
> treated as plain bytes(*). That is fine, but Unicode is not handled
> correctly. This might also be an opportunity to treat Unicode strings as
> UTF-16 (LE) strings which must be nul-terminated. So u'Bar' should become
> b'B\0a\0r\0\0\0'.
It's worth saying that encoding in the registry itself is not always
UTF-16LE. It's sometimes UTF-8, ASCII or (in a case I found last
week) an NLS like ISO-8859-1 or Big5. Essentially the consuming app
always has to know what encoding to use. Doing "clever" stuff in the
bindings is therefore almost always going to be wrong in some case.
(This is also why the C functions like hivex_value_string are
deprecated).
When doing a registry export (.reg), all strings like "Key"="Value"
appears to
be UTF-16 strings. Trying to push an UTF-8 string into the registry results in
Chinese characters (UTF-16?). Could you confirm/reject this against the
exports of your keys? Also, when the trailing NUL byte is missing in the
services values, a BSOD can be observed.
If it is necessary to support other encodings, it may be worth to add a
function to wrap the encoding, (type?) and value:
UTF_16_LE = "utf-16-le"
class RegistryString(object):
def __init__(self, type, value, encoding=UTF_16_LE):
...
def value(self):
return self.value.encode(self.encoding) + u"\0".encode(self.encoding)
(maybe introduce a wrapper function for this to avoid long lines)
Strings are always NUL-terminated, right? I recall reading something like that
in the MSDN documentation.
Kind regards,
Peter
https://lekensteyn.nl