Matthew Booth:
On Fri, 2013-11-22 at 20:14 +0000, Richard W.M. Jones wrote:
> On Fri, Nov 22, 2013 at 05:56:00PM +0000, adrelanos wrote:
>> Thank you all for your suggestions!
>>
>> Richard W.M. Jones:
>>> I keep meaning to write a comprehensive "virt-diff" tool. I needed
it
>>> myself just yesterday.
>>
>> Most interesting. I guess there are two reasons for creating such a
>> tool: just compare the images (show the diff) and/or check for malicious
>> additions in the other image.
>>
>> Did you consider implementing the former or both?
>
> For all the reasons that Alex goes into, it would just be for checking
> NON-malicious differences. The use case is to reverse engineer what
> files change in a guest when you perform an action (eg. install a
> Windows driver or run some Linux administrative command).
>
> [...]
>> At the moment I am not trying to write a virt-diff like tool, but
>> something simpler. A tool to create a report of all of a vm image's
>> contents. (Checksums for all files, filesystem, for MBR and Volume Boot
>> Record.) When publishing VM images, it might be useful to publish such a
>> report together with the image, so others who re-build from source can
>> be certain, they ended up with a very similar image. When having created
>> two such reports, one could easily get a virt-diff like tool.
>
> I think Matt Booth was doing something like this for Windows systems,
> with the aim of being able to recreate a Windows VM from a (smaller)
> description. Don't know what state that was/is in.
I wrote a POC tool to store an MD5 of every file on a Windows
filesystem. It looked like a good idea for what it was, but not very
applicable here.
> [...]
>> What other data can there be outside the filesystem?
>>
>> I can think of:
>>
>> - MBR
>> - Volume Boot Record
>>
>> Anything else?
>
> Potentially all unused space inside and between partitions /
> filesystems / logical volumes. The boot loader is sometimes stored in
> the space between the MBR and the first partition. Other peculiar
> things lie in other spaces.
Any mechanism for doing volume management. e.g. MBR, GPT, LVM (Linux),
LDM (Windows). Sometimes these overlap and interact in complex ways,
e.g. LDM has an MBR and a GPT, both of which it ignores in favour of its
own metadata.
> However if you don't care about guests that are malicious / hiding
> data, then you can ignore everything except for the MBR and any
> non-zero data between the MBR and the first partition. Note for GPT
> you have to take into account two partition tables as well.
>
>> If these have been compared, the compared image should be as safe to use
>> as the original one?
>>
>> (I could imagine that there can be extra data outside filesystem, maybe
>> in regions outside the partition table, but those data shouldn't get
>> executed after starting the image in a VM.)
I'm coming in to this discussion late, so I don't know what you're doing
or how paranoid you need to be.
A few years ago, I could say very paranoid. Otherwise, I wouldn't do it
in the first place. :) Nowadays after the news coverage, I'd say no
paranoia at all, just reasonable precuations. ;)
However, cranking up the paranoia a
little, imagine the following scenario:
There's a bug in a critical boot element which means the boot relies on
uninitialised disk space. As it happens, in a normal installation this
uninitialised disk space is always safe and it's located somewhere which
will rarely, if ever, be touched, so nobody has every noticed it.
(Paranoia level: state actor. Somebody put the bug there deliberately.)
Malicious person modifies the uninitialised disk space. Your tool will
never notice. The boot process is now compromised.
You could probably come up with more with a few minutes of thought. I'm
pretty sure a dedicated team given a few months to work on this project
could come up with some inventive ideas :)
I hope you are wrong. :) I am going to ask for more feedback on another
mailing list after the initial implementation of the script is done.
(At the moment I am making good progress, the initial report creation
script is almost finished, currently ironing out a few non-deterministic
/var/cache... files and folders and recreating them during the first boot.)