Re: [Libguestfs] [PATCH nbdkit] cache, cow: Do not round size down.

Wednesday, 17 February 2021

On 2/17/21 6:57 AM, Richard W.M. Jones wrote:
...
 Both the cache and cow filters have the annoying behaviour that the
 size of the underlying plugin is rounded down to the block size used
 by the filters (ie. 4K).  This is especially visible for single sector
 disk images:

 $ ./nbdkit memory size=512 --filter=cow --run 'nbdinfo --size $uri'
 0
 $ ./nbdkit memory size=512 --filter=cache --run 'nbdinfo --size $uri'
 0

 but affects all users of these filters somewhat randomly and
 unexpectedly.  (For example I had a GPT partitioned disk image from a
 customer which did not have a 4K aligned size, which was unexpectedly
 "corrupted" when I tried to open it using the cow filter - the reason
 for the corruption was the backup partition table at the end of the
 disk being truncated.)

 Getting rid of this assumption is awkward: the general idea is that we
 round up the size of the backing file / bitmap by a full block.  But
 whenever we are asked to read or write to the underlying plugin we
 handle the tail case carefully.

 This also tests various corner cases.
 --- 
...
 +++ b/filters/cache/nbdkit-cache-filter.pod
 @@ -25,12 +25,6 @@ does not have effective caching, or (with C<cache=unsafe>) to
defeat
  flush requests from the client (which is unsafe and can cause data
  loss, as the name suggests).

 -Note that the use of this filter rounds the image size down to a
 -multiple of the caching granularity (the larger of 4096 or the
 -C<f_bsize> field of L<fstatvfs(3)>), to ease the implementation. If
 -you need to round the image size up instead to access the last few
 -bytes, combine this filter with L<nbdkit-truncate-filter(1)>. 
Nice to see this restriction go.

Do you also want to tweak the sentence in nbdkit-truncate-filter.pod
that mentions cache and cow filters?

...
 +++ b/filters/cache/blk.c 
...
  int
  blk_set_size (uint64_t new_size)
  {
 -  if (bitmap_resize (&bm, new_size) == -1)
 +  size = new_size;
 +
 +  if (bitmap_resize (&bm, size) == -1)
      return -1;

 -  if (ftruncate (fd, new_size) == -1) {
 +  if (ftruncate (fd, ROUND_UP (size, blksize)) == -1) { 
So the underlying fd is aligned...

...
      nbdkit_error ("ftruncate: %m");
      return -1;
    }

 -  if (lru_set_size (new_size) == -1)
 +  if (lru_set_size (size) == -1) 
...while the lru size is still the original size.

...
 @@ -199,9 +207,22 @@ blk_read (struct nbdkit_next_ops *next_ops, void
*nxdata,
                  "unknown");

    if (state == BLOCK_NOT_CACHED) { /* Read underlying plugin. */
 -    if (next_ops->pread (nxdata, block, blksize, offset, 0, err) == -1)
 +    unsigned n = blksize, tail = 0;
 +
 +    if (offset + n > size) {
 +      tail = offset + n - size;
 +      n -= tail;
 +    }
 +
 +    if (next_ops->pread (nxdata, block, n, offset, 0, err) == -1)
        return -1; 
Good, you were careful to never read beyond EOF of the underlying plugin.

...

 +    /* Normally we're reading whole blocks, but at the very end of the
 +     * file we might read a partial block.  Deal with that case by
 +     * zeroing the tail.
 +     */
 +    memset (block + n, 0, tail);
 + 
I don't know if that tail can ever leak out to the client, but agree
that it is safer to zero it anyway.

>      /* If cache-on-read, copy the block to the cache. */
>      if (cache_on_read) {
>        nbdkit_debug ("cache: cache-on-read block %" PRIu64
> @@ -247,9 +268,22 @@ blk_cache (struct nbdkit_next_ops *next_ops, void *nxdata,
>  
>    if (state == BLOCK_NOT_CACHED) {
>      /* Read underlying plugin, copy to cache regardless of cache-on-read. */
> -    if (next_ops->pread (nxdata, block, blksize, offset, 0, err) == -1)
> +    unsigned n = blksize, tail = 0;
> +
> +    if (offset + n > size) {
> +      tail = offset + n - size;
> +      n -= tail;
> +    }
> +
> +    if (next_ops->pread (nxdata, block, n, offset, 0, err) == -1)
>        return -1;
...

 +    /* Normally we're reading whole blocks, but at the very end of the
 +     * file we might read a partial block.  Deal with that case by
 +     * zeroing the tail.
 +     */
 +    memset (block + n, 0, tail);
 + 
Duplicated code, but I'm not sure if it warrants a helper function.

...
      nbdkit_debug ("cache: cache block %" PRIu64 "
(offset %" PRIu64 ")",
                    blknum, (uint64_t) offset);

 @@ -281,6 +315,12 @@ blk_writethrough (struct nbdkit_next_ops *next_ops, void *nxdata,
                    int *err)
  {
    off_t offset = blknum * blksize;
 +  unsigned n = blksize, tail = 0;
 +
 +  if (offset + n > size) {
 +    tail = offset + n - size;
 +    n -= tail;
 +  }

    reclaim (fd, &bm);

 @@ -293,7 +333,7 @@ blk_writethrough (struct nbdkit_next_ops *next_ops, void *nxdata,
      return -1;
    }

 -  if (next_ops->pwrite (nxdata, block, blksize, offset, flags, err) == -1)
 +  if (next_ops->pwrite (nxdata, block, n, offset, flags, err) == -1)
      return -1; 
Okay, you're careful to not write beyond EOF of the plugin, regardless
of what extra may live in the tail in the local fd.

LGTM

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] [PATCH nbdkit] cache, cow: Do not round size down.