Piping in here as someone who has worked on file system and Registry differencing for a few years now.  Taking diffs of a storage system is not a straightforward task.  Hopefully, this message saves you some re-implementation heartache.

In the forensics world, there is a tool called Fiwalk, which enumerates the contents of a file system and its metadata (with some basic data summaries, including libmagic and checksums).  The tool "idifference" compares file system states and enumerates differences, using the Digital Forensics XML output from Fiwalk.

A research publication on the forensic differencing process and idifference is here:

Fiwalk is a component of The SleuthKit, here:
If you wish to use Fiwalk on your images, you should convert any of your disk images to a raw image or Expert Witness Format.
Actually, I don't suppose qemu-img has a FUSE-like wrapper that exposes the underlying image as a raw file?

DFXML has an entry on the Forensics Wiki:

As for your external-to-filesystem data question:  I think you got the essential non-file-system data.  I can imagine data fragments from past/shrunken file systems, or hidden-data regions that fall outside what's recorded in the partition table.  My imagination runs dry there, though.


Thank you all for your suggestions!

> I keep meaning to write a comprehensive "virt-diff" tool.  I needed it
> myself just yesterday.

Most interesting. I guess there are two reasons for creating such a
tool: just compare the images (show the diff) and/or check for malicious
additions in the other image.

Did you consider implementing the former or both?

Do you think it's realistic to compare vm images with the goal of
eventually finding deliberately hard to detect (malicious) changes?

At the moment I am not trying to write a virt-diff like tool, but
something simpler. A tool to create a report of all of a vm image's
contents. (Checksums for all files, filesystem, for MBR and Volume Boot
Record.) When publishing VM images, it might be useful to publish such a
report together with the image, so others who re-build from source can
be certain, they ended up with a very similar image. When having created
two such reports, one could easily get a virt-diff like tool.

> although that *only*
> compares files, not the other data outside the filesystem

What other data can there be outside the filesystem?

I can think of:

- Volume Boot Record

Anything else?

If these have been compared, the compared image should be as safe to use
as the original one?

(I could imagine that there can be extra data outside filesystem, maybe
in regions outside the partition table, but those data shouldn't get
executed after starting the image in a VM.)


