On 2/17/21 6:57 AM, Richard W.M. Jones wrote:
Both the cache and cow filters have the annoying behaviour that the
size of the underlying plugin is rounded down to the block size used
by the filters (ie. 4K). This is especially visible for single sector
disk images:
$ ./nbdkit memory size=512 --filter=cow --run 'nbdinfo --size $uri'
0
$ ./nbdkit memory size=512 --filter=cache --run 'nbdinfo --size $uri'
0
but affects all users of these filters somewhat randomly and
unexpectedly. (For example I had a GPT partitioned disk image from a
customer which did not have a 4K aligned size, which was unexpectedly
"corrupted" when I tried to open it using the cow filter - the reason
for the corruption was the backup partition table at the end of the
disk being truncated.)
Getting rid of this assumption is awkward: the general idea is that we
round up the size of the backing file / bitmap by a full block. But
whenever we are asked to read or write to the underlying plugin we
handle the tail case carefully.
This also tests various corner cases.
---
+++ b/filters/cache/nbdkit-cache-filter.pod
@@ -25,12 +25,6 @@ does not have effective caching, or (with C<cache=unsafe>) to
defeat
flush requests from the client (which is unsafe and can cause data
loss, as the name suggests).
-Note that the use of this filter rounds the image size down to a
-multiple of the caching granularity (the larger of 4096 or the
-C<f_bsize> field of L<fstatvfs(3)>), to ease the implementation. If
-you need to round the image size up instead to access the last few
-bytes, combine this filter with L<nbdkit-truncate-filter(1)>.
Nice to see this restriction go.
Do you also want to tweak the sentence in nbdkit-truncate-filter.pod
that mentions cache and cow filters?
+++ b/filters/cache/blk.c
int
blk_set_size (uint64_t new_size)
{
- if (bitmap_resize (&bm, new_size) == -1)
+ size = new_size;
+
+ if (bitmap_resize (&bm, size) == -1)
return -1;
- if (ftruncate (fd, new_size) == -1) {
+ if (ftruncate (fd, ROUND_UP (size, blksize)) == -1) {
So the underlying fd is aligned...
nbdkit_error ("ftruncate: %m");
return -1;
}
- if (lru_set_size (new_size) == -1)
+ if (lru_set_size (size) == -1)
...while the lru size is still the original size.
@@ -199,9 +207,22 @@ blk_read (struct nbdkit_next_ops *next_ops, void
*nxdata,
"unknown");
if (state == BLOCK_NOT_CACHED) { /* Read underlying plugin. */
- if (next_ops->pread (nxdata, block, blksize, offset, 0, err) == -1)
+ unsigned n = blksize, tail = 0;
+
+ if (offset + n > size) {
+ tail = offset + n - size;
+ n -= tail;
+ }
+
+ if (next_ops->pread (nxdata, block, n, offset, 0, err) == -1)
return -1;
Good, you were careful to never read beyond EOF of the underlying plugin.
+ /* Normally we're reading whole blocks, but at the very end of the
+ * file we might read a partial block. Deal with that case by
+ * zeroing the tail.
+ */
+ memset (block + n, 0, tail);
+
I don't know if that tail can ever leak out to the client, but agree
that it is safer to zero it anyway.
> /* If cache-on-read, copy the block to the cache. */
> if (cache_on_read) {
> nbdkit_debug ("cache: cache-on-read block %" PRIu64
> @@ -247,9 +268,22 @@ blk_cache (struct nbdkit_next_ops *next_ops, void *nxdata,
>
> if (state == BLOCK_NOT_CACHED) {
> /* Read underlying plugin, copy to cache regardless of cache-on-read. */
> - if (next_ops->pread (nxdata, block, blksize, offset, 0, err) == -1)
> + unsigned n = blksize, tail = 0;
> +
> + if (offset + n > size) {
> + tail = offset + n - size;
> + n -= tail;
> + }
> +
> + if (next_ops->pread (nxdata, block, n, offset, 0, err) == -1)
> return -1;
+ /* Normally we're reading whole blocks, but at the very end of the
+ * file we might read a partial block. Deal with that case by
+ * zeroing the tail.
+ */
+ memset (block + n, 0, tail);
+
Duplicated code, but I'm not sure if it warrants a helper function.
nbdkit_debug ("cache: cache block %" PRIu64 "
(offset %" PRIu64 ")",
blknum, (uint64_t) offset);
@@ -281,6 +315,12 @@ blk_writethrough (struct nbdkit_next_ops *next_ops, void *nxdata,
int *err)
{
off_t offset = blknum * blksize;
+ unsigned n = blksize, tail = 0;
+
+ if (offset + n > size) {
+ tail = offset + n - size;
+ n -= tail;
+ }
reclaim (fd, &bm);
@@ -293,7 +333,7 @@ blk_writethrough (struct nbdkit_next_ops *next_ops, void *nxdata,
return -1;
}
- if (next_ops->pwrite (nxdata, block, blksize, offset, flags, err) == -1)
+ if (next_ops->pwrite (nxdata, block, n, offset, flags, err) == -1)
return -1;
Okay, you're careful to not write beyond EOF of the plugin, regardless
of what extra may live in the tail in the local fd.
LGTM
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization:
qemu.org |
libvirt.org