On 02/02/16 21:35, Richard W.M. Jones wrote:
On Tue, Feb 02, 2016 at 07:40:12PM +0200, noxdafox wrote:
> Greetings,
>
> I'm playing around an idea and I'd like to ask you some questions.
>
> I'd like to extract the MFT table from a disk image file. The idea
> is to employ it to build a sort of reverse lookup table which, given
> a cluster, could retrieve the corresponding file with the related
> metadata.
>
> Such table could be used to optimize the analysis of disk snapshots
> in order to collect the changes which happened on the disk. As the
> disk snapshots contains only the new or modified clusters, I could
> avoid exploring the whole FS content and focus on what has really
> changed on disk.
>
> Did you explore the concept anyhow?
No.
> Is there a way I can use libguestfs to locate and extract the MFT
> table from a disk image?
If there's an ntfsprogs command that does this (ntfsinfo --mft maybe?)
then it's really easy to extract the output from that command. You
could hack it together using `debug sh', search this page:
http://libguestfs.org/guestfs-faq.1.html
... but if you wanted to do it "properly" then you could add an API
modelled on one of the `FileOut' APIs, eg:
https://github.com/libguestfs/libguestfs/blob/master/daemon/base64.c#L100
For information on adding APIs, see:
http://libguestfs.org/guestfs-hacking.1.html#adding-a-new-api I played around a
bit and I need to confess I am impressed on how easy
is to add functionalities to libguestfs.
I could easily extract the Master File Table using the download API and
parse it with third party tools.
I'd like to extract as well the Update Sequence Number Journal
($UsnJrnl) but it seems unaccessible via it's path (C:\$Extend\$UsnJrnl).
I tried on a real disk and it seems to be a limitation of the NTFS-3g
driver: it can extract C:\$MTF and C:\$LogFile, it can list C:\$Extend
content but it cannot access those files.
Curiously enough, stat() syscall on C:\$Extend\$UsnJrnl seems to work
and returns the correct inode number. Yet the size is wrong as it
reports 0 while the real one is > 9Mb.
The next step I tried was to use ntfscat command in the following
manner: ntfscat -i <UsnJrnl inode number> /dev/sdXX and it worked
flawlessly.
So I proceeded adding such API to libguestfs and I could extract the
journal without any issue. The UsnJrnl file is very handy to check what
changes were made on disk. Not only it's faster than using virt-diff on
two different snapshots but it also shows much more relevant
information. I could for example track down temporary files created and
deleted within the two snapshots.
All of this to say I'd like to add the possibility of extracting files
via their inode. This functionality has the advantage of not requiring
the FS to be mounted. Would libguestfs benefit from this?
If so how should I proceed? Which API names to use?
Most straightforward would be something like:
ntfsicat(device, inode)
or
ntfsidownload(device, inode)
I guess also linux guest disks would benefit from this but this requires
a bit more research.
This question of how do you find which disk block is associated with a
particular file comes up often enough that I have looked at it various
times on my blog:
https://rwmj.wordpress.com/2014/02/21/use-guestfish-and-nbdkit-to-examine...
https://rwmj.wordpress.com/2014/11/23/mapping-files-to-disk/
Rich.