On Wed, Apr 24, 2019 at 02:39:03PM -0500, Eric Blake wrote:
On 3/28/19 11:18 AM, Richard W.M. Jones wrote:
> This uses lseek SEEK_DATA/SEEK_HOLE to search for allocated data and
> holes in the underlying file.
> ---
> plugins/file/file.c | 141 ++++++++++++++++++++++++++++++++++---
> tests/Makefile.am | 5 ++
> tests/test-file-extents.sh | 57 +++++++++++++++
> 3 files changed, 193 insertions(+), 10 deletions(-)
>
> +#ifdef SEEK_HOLE
> +/* Extents. */
> +
> +static int
> +file_can_extents (void *handle)
> +{
> + struct handle *h = handle;
> + off_t r;
> +
> + /* A simple test to see whether SEEK_HOLE etc is likely to work on
> + * the current filesystem.
> + */
> + pthread_mutex_lock (&lseek_lock);
> + r = lseek (h->fd, 0, SEEK_HOLE);
> + pthread_mutex_unlock (&lseek_lock);
> + if (r == -1) {
> + nbdkit_debug ("extents disabled: lseek: SEEK_HOLE: %m");
> + return 0;
> + }
> + return 1;
> +}
Should we also return 0 if the lseek() returned the offset of EOF? More
concretely, what if we test:
r = lseek(h->fd, lseek(h->fd, 0, SEEK_HOLE), SEEK_DATA);
If r is non-negative, then we have a sparse file; if the inner SEEK_HOLE
failed then the outer lseek(, -1, SEEK_DATA) should fail with EINVAL;
while if SEEK_HOLE succeeded but reported the offset of EOF, then
SEEK_DATA should fail due to ENXIO and we know the file has no holes at
all. In favor of the double check: a completely non-sparse file is just
wasting time on further lseek() calls that won't tell us anything
different than our default of all data (and which may be helpful for
things like tmpfs that have abysmal lseek() performance). Perhaps a
reason against: if we expect NBD_CMD_TRIM/NBD_CMD_WRITE_ZEROES or even a
third-party to be punching holes in the meantime, then opting out of
future lseek() won't see those later additions of holes.
This last point seems crucial - the file could become sparse, even if
it starts off as non-sparse.
The test is designed really to eliminate the case where SEEK_HOLE
isn't supported and returns an error.
However you rightly point out that there's another case we don't
cover: what if we're using tmpfs where SEEK_HOLE is supported but has
poor performance? I would say the best solution is to fix tmpfs! But
if that's not possible, perhaps we can do {f,}statfs(2) on the file
and blacklist certain combinations of f_type and kernel version?
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html