On 07/03/16 13:29, Richard W.M. Jones wrote:
On Sun, Mar 06, 2016 at 05:42:24PM +0200, Matteo Cafasso wrote:
> As discussed in the topic:
https://www.redhat.com/archives/libguestfs/2016-March/msg00018.html
>
> I'd like to add to libguestfs the disk forensics capabilities offered by The
Sleuth Kit.
>
http://www.sleuthkit.org/
>
> The two APIs I'm adding with the patch are a simple example of which type of
features TSK can enable.
A few comments in general terms:
The current splitting of the commits doesn't make much sense to me.
I think it would be better as:
- commit to add TSK to the appliance
- commit to add the icat API
- tests for icat
- commit to add the fls0 API
- tests for fls0
although it would be fine to combine the tests with the new API, or
even have all the tests as a single separate commit (as now).
This benefits you because it will allow patches to go upstream
earlier. For example, a commit to add TSK to the appliance is a
simple and obvious change that I see no problem with. Also the icat
API is closer to being ready than the fls0 API (see below for
explanation).
Indeed I've done quite a poor job in this. I will split it as
suggested.
>> <fs> fls0 /dev/sda2 /home/noxdafox/disk-content.txt
> r/r 15711-128-1:
$Recycle.Bin/S-1-5-21-2379395878-2832339042-1309242031-1000/desktop.ini
> -/r * 60015-128-1:
$Recycle.Bin/S-1-5-21-2379395878-2832339042-1309242031-1000/$R07QQZ2.txt
> -/r * 60015-128-3:
$Recycle.Bin/S-1-5-21-2379395878-2832339042-1309242031-1000/$R07QQZ2.txt:Zone.Identifier
What is `/home/noxdafox/disk-content.txt'?
It's the local (host side) file
where to store the command output.
The problem with this API is it pushes all the parsing up in the
stack, to libguestfs consumers.
In general we'd like to avoid that and have just one place where all
parsing needs to be done (ie. libguestfs itself), so it'd be nicer to
have an API that returns a list of structs (RStructList) with all the
important fields parsed out.
As the API documentation says, this is the low level
API which I have
provided as an example.
I took inspiration from the guestfs_ls0 API which does a similar job
storing the content of a directory onto a host file.
If I understood correctly (the dynamic code generation is still
confusing me a bit), the way Libguestfs implements commands which could
have a large output is via first dumping it onto a local file and then
iterating over it.
This command would list the entire content of a disk including the
deleted files therefore we need to expect a large output.
What is missing is the higher level implementation which would pretty
much look like the libguestfs_ls API. I need to better understand how to
implement it and suggestions are more than appreciated. I tried to trace
back how the guestfs_find is implemented for example, but I'm still a
bit disoriented by the automagic code generation.
Does TSK have a machine-readable mode? If it does, it'll definitely
make things easier if (eg) JSON or XML output is available. If not,
push upstream to add that to TSK -- it's a simple change for them,
which will make their tools much more usable, a win for everyone.
I personally
disagree on this. The TSK `fls` command is a clone of the
bash `ls` one. Maybe it's more similar to `ls -al` as it returns
additional information. IMHO asking to upstream to add JSON or XML
output format would sound pretty much as asking the same to bash for the
`ls` utility.
The end result is to still return a list of structs or a list of
strings. But parsing the `fls` output shouldn't be that hard. It's
documentation is here:
http://wiki.sleuthkit.org/index.php?title=Fls
Rich.