On Tue, Jan 31, 2012 at 12:21:44PM -0500, Todd Mummert wrote:
I've made some changes to hivexml.c, that I think addresses the
invalid XML that is being generated (as of version 1.3.3). I'm only
addressing invalid XML characters in string values and string-lists.
As I saw mentioned in earlier forum messages
(
https://www.redhat.com/archives/libguestfs/2011-September/msg00082.html),
there could be invalid XML character issues in keys as well, but that
is not addressed in this patch.
Secondly, I slightly modified the string element in that I used a
value attribute to more closely align with the earlier change that
Alex did to singular string values.
I've addressed the three of you because you all commented on the
earlier patch, and I would like to get your feedback on mine. If
you'd like me to post to the libguestfs mailing list, I could do so...
let me know if there are patch standards that I should use.
The approach I took was toI use XMLisCharQ() to check each UTF-8
string value in value_string and value_mulitple_string.
The issue I encountered was that the following UTF-8 string was
occurring in a clean Win7 install --
e689a4 02 eb8288c 7937 46 5
The 02 (^B) is an invalid XML character in the 1.0 spec, and was
causing xmllint to fail in the following manner
[hivex-1.3.3-orig]# xmllint xmlout.orig
xmlout.orig:2: parser error : PCDATA invalid Char value 2
18T02:23:42Z</mtime><value type="string-list" key="Autorecover
MOFs"><string>æ¤
^
After the patch, ...with all the surrounding stuff cut away
[hivex-1.3.3-new]# xmllint xmlout.new | tidy -raw -xml -i
<?xml version="1.0" encoding="utf-8"?>
<hive>
<mtime>2011-02-18T02:23:42Z</mtime>
<value type="string-list" key="Autorecover MOFs">
<string encoding="base64" value="5omkAuuCiMeTdGU=" />
</value>
</hive>
And the patch itself...
[hivex]# cat hivexml.patch
diff -uprN hivex-1.3.3-orig/xml/hivexml.c hivex-1.3.3-new/xml/hivexml.c
--- hivex-1.3.3-orig/xml/hivexml.c 2011-09-22 09:17:09.000000000 -0400
+++ hivex-1.3.3-new/xml/hivexml.c 2012-01-30 23:39:14.995639422 -0500
@@ -33,6 +33,7 @@
#endif
#include <libxml/xmlwriter.h>
+#include <libxml/chvalid.h>
#include "hivex.h"
@@ -209,6 +210,22 @@ filetime_to_8601 (int64_t windows_ticks)
}
static int
+isValidXMLString(const char *string)
Can we call this is_valid_xml_string? I hate camel-casing.
+{
+ int c;
+ int len = strlen(string);
+ int pos = 0;
+ int charlen = len;
+ while ((c = xmlGetUTF8Char(string+pos, &charlen)) >= 0) {
+ if (xmlIsCharQ(c) == 0)
+ return 0;
+ pos += charlen;
+ charlen = len - pos;
+ }
+ return 1;
+}
+
+static int
node_start (hive_h *h, void *writer_v, hive_node_h node, const char *name)
{
int64_t last_modified;
@@ -265,6 +282,20 @@ end_value (xmlTextWriterPtr writer)
XML_CHECK (xmlTextWriterEndElement, (writer));
}
+static void
+start_string(xmlTextWriterPtr writer, const char *encoding)
+{
+ XML_CHECK (xmlTextWriterStartElement, (writer, BAD_CAST "string"));
+ if (encoding)
+ XML_CHECK (xmlTextWriterWriteAttribute, (writer, BAD_CAST
"encoding", BAD_CAST encoding));
+}
+
+static void
+end_string(xmlTextWriterPtr writer)
+{
+ XML_CHECK (xmlTextWriterEndElement, (writer));
+}
+
static int
value_string (hive_h *h, void *writer_v, hive_node_h node, hive_value_h value,
hive_type t, size_t len, const char *key, const char *str)
@@ -292,9 +323,14 @@ value_string (hive_h *h, void *writer_v,
type = "unknown";
}
- start_value (writer, key, type, NULL);
+ int validXML = isValidXMLString(str);
+ start_value (writer, key, type, validXML ? NULL : "base64");
XML_CHECK (xmlTextWriterStartAttribute, (writer, BAD_CAST "value"));
- XML_CHECK (xmlTextWriterWriteString, (writer, BAD_CAST str));
+ if (validXML)
+ XML_CHECK (xmlTextWriterWriteString, (writer, BAD_CAST str));
+ else
+ XML_CHECK (xmlTextWriterWriteBase64, (writer, str, 0, strlen(str)));
+
XML_CHECK (xmlTextWriterEndAttribute, (writer));
end_value (writer);
return 0;
@@ -310,9 +346,14 @@ value_multiple_strings (hive_h *h, void
size_t i;
for (i = 0; argv[i] != NULL; ++i) {
- XML_CHECK (xmlTextWriterStartElement, (writer, BAD_CAST "string"));
- XML_CHECK (xmlTextWriterWriteString, (writer, BAD_CAST argv[i]));
- XML_CHECK (xmlTextWriterEndElement, (writer));
+ int validXML = isValidXMLString(argv[i]);
+ start_string(writer, validXML ? NULL : "base64");
+ XML_CHECK (xmlTextWriterStartAttribute, (writer, BAD_CAST "value"));
+ if (validXML)
+ XML_CHECK (xmlTextWriterWriteString, (writer, BAD_CAST argv[i]));
+ else
+ XML_CHECK (xmlTextWriterWriteBase64, (writer, argv[i], 0,
strlen(argv[i])));
+ end_string(writer);
}
end_value (writer);
[hivex]#
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
New in Fedora 11: Fedora Windows cross-compiler. Compile Windows
programs, test, and build Windows installers. Over 70 libraries supprt'd
http://fedoraproject.org/wiki/MinGW http://www.annexia.org/fedora_mingw