2016-10-11 11:56 GMT+03:00 Pino Toscano <ptoscano(a)redhat.com>:
On Saturday, 8 October 2016 18:27:21 CEST Matteo Cafasso wrote:
> Patch ready for merging.
>
> v4:
>
> - check return code of tsk_fs_attr_walk
> - pass TSK_FS_FILE_WALK_FLAG_NOSPARSE as additional flag to
> tsk_fs_attr_walk
>
> After discussing with TSK authors the behaviour is clear. [1]
Thanks, this improves the situation a bit.
> In case of COMPRESSED blocks, the callback will be called for all the
> attributes no matter whether they are on disk or not (sparse). In
> such cases, the block address will be 0. [2]
Note that the API docs say:
For compressed and sparse attributes, the address *may* be zero.
(emphasis is mine)
My concern is that, if the address in such cases is
"unspecified", then
the comparisons in "attrwalk_callback" are done against a
random/unitialized value (which would be bad).
I understand your concerns. The data will not be wrong. Is the API
documentation being misleading.
The data *will* be 0 for SPARSE blocks and *might* be 0 or not for
compressed blocks based on certain criteria. See below.
Also, if the block address would be zero, what's the point of
having it
among the blocks tsk_fs_attr_walk() iterates over?
This is due to the way NTFS organizes information and deals with its
compression and the way the API loops over them.
For each file or directory, there is a MFT (Master File Table) record which
consists in a linear repository of attributes (1Kb of size each).
Attributes can be resident within the MFT or non-resident according to
their size. The $DATA attribute storing the actual file content is an
example of typically non-resident ones.
Non-resident attributes are stored on disk in what is referred as data-runs
(contiguous blocks) which are then mapped within the attribute itself. A
typical file greater than 800 Bytes has the $DATA attribute containing a
map of data runs with their location on the disk. If the map itself is too
big for the $DATA attribute (this can happen if the actual content is too
fragmented), then extra records are created and their mapping is placed in
a special attribute called $ATTRIBUTE_LIST. [1]
When the given file is compressed (native NTFS compression, not application
level one), the algorithm goes on each data run within the attribute and:
[2]
1 if the data run is zero filled, will set the corresponding blocks as
sparse and set their address to 0.
2 if compressing the data run does not save any disk block, it will leave
it as is.
3 if compressing the data run does save one or more blocks, the spared one
will be again marked as sparse and their address will be 0.
Note that the entire attribute will be marked as compressed no matter what
happened to the clusters on disk.
The logic loops through all non-resident attributes (which is what we want:
we want all the disk blocks allocated for that file). For each attribute,
it loops over all the blocks which that attributes maps and calls the
callback.
Our issue is the information at the origin of the sparse flag: the
information might come from the block (BAD/ALLOC/UNALLOC), or from the file
metadata (RAW,SPARSE,COMPRESSED,CONT, META). [3]
The tsk_fs_attr_walk() walks over the given attribute's blocks. In case we
are inspecting attributes of compressed files, the flag will report the
*file* status (COMPRESSED) yet will not able to tell us what the
compression algorithm did (1,2,3) to that block. It will still correctly
give us the address: 0 if sparse (case 1 or 3) or the correct number
otherwise (case 2 or 3).
[1]
https://en.wikipedia.org/wiki/NTFS#Attribute_lists.2C_attributes.2C_and_s...
[2]
http://www.digital-evidence.org/fsfa/ - Chapter 11
[3]
http://www.sleuthkit.org/sleuthkit/docs/api-docs/4.2/tsk__fs_8h.html#a1e6...
- "Note that some of these are set only by file_walk because they are
file-level details, such as compression and sparse."
Thanks,
--
Pino Toscano
_______________________________________________
Libguestfs mailing list
Libguestfs(a)redhat.com
https://www.redhat.com/mailman/listinfo/libguestfs