On 6/11/19 3:49 AM, Martin Kletzander wrote:
This filter caches the last result of the extents() call and offers a
nice
speed-up for clients that only support req_one=1 in combination with plugins
like vddk, which has no overhead for returning information for multiple extents
in one call, but that call is very time-consuming.
Quick test showed that on a fast connection and a sparsely allocated 16G disk
with a OS installed `qemu-img map` runs 16s instead of 33s (out of which it
takes 5s to the first extents request). For 100G disk with no data on it, that
is one hole extent spanning the whole disk (where there is no space for
improvement) this does not add any noticeable overhead.
Signed-off-by: Martin Kletzander <mkletzan(a)redhat.com>
---
Notes:
v2:
- Clearing the REQ_ONE when forwarding the request to plugin
- I hate the wording of the comment explaining it, please suggest a reword (or
just just fix it)
I'll give it a shot.
- Tests are now using custom shell plugin logging all accesses and also qemu-io
testing all other expected behaviour
+++ b/filters/cacheextents/cacheextents.c
+
+/* Cached extents from the last extents () call and its start and end for the
+ sake of simplicity. */
The prevailing style seems to be the use of leading ' *' on subsequent
comment lines, and */ on its own line.
+struct nbdkit_extents *cache_extents;
+static uint64_t cache_start;
+static uint64_t cache_end;
+
+static void
+cacheextents_unload (void)
+{
+ nbdkit_extents_free (cache_extents);
+}
+
+static int
+cacheextents_add (struct nbdkit_extents *extents, int *err)
+{
+ size_t i = 0;
+
+ for (i = 0; i < nbdkit_extents_count (cache_extents); i++) {
+ struct nbdkit_extent ex = nbdkit_get_extent (cache_extents, i);
+ if (nbdkit_add_extent (extents, ex.offset, ex.length, ex.type) == -1) {
+ *err = errno;
+ return -1;
+ }
+ }
+
+ return 0;
+}
+
+static int
+cacheextents_fill (struct nbdkit_extents *extents, int *err)
+{
+ size_t i = 0;
+ size_t count = nbdkit_extents_count (extents);
+ struct nbdkit_extent first = nbdkit_get_extent (extents, 0);
+ struct nbdkit_extent last = nbdkit_get_extent (extents, count - 1);
+
+ nbdkit_extents_free (cache_extents);
+ cache_start = first.offset;
+ cache_end = last.offset + last.length;
+ cache_extents = nbdkit_extents_new (cache_start, cache_end);
Needs a check for failure.
Hmm - my earlier comments about appending to the cache are hard if we
cap it to cache_end; I may still play with tweaking the cache algorithm
to allow appending, but it would be a separate patch on top.
+
+ for (i = 0; i < count; i++) {
+ struct nbdkit_extent ex = nbdkit_get_extent (extents, i);
+ nbdkit_debug ("cacheextents: updating cache with"
+ ": offset=%" PRIu64
+ ": length=%" PRIu64
+ "; type=%x",
Punctuation looks a bit funny; I'll probably normalize to
updating cache with: offset=512 length=512 type=0
to match...
+ ex.offset, ex.length, ex.type);
+ if (nbdkit_add_extent (cache_extents, ex.offset, ex.length, ex.type) == -1) {
+ *err = errno;
+ return -1;
+ }
+ }
+
+ return 0;
+}
+
+static int
+cacheextents_extents (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle, uint32_t count, uint64_t offset, uint32_t flags,
+ struct nbdkit_extents *extents,
+ int *err)
+{
+ ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock);
+
+ nbdkit_debug ("cacheextents:"
+ " cache_start=%" PRIu64
+ " cache_end=%" PRIu64
+ " cache_extents=%p",
...this style.
+ cache_start, cache_end, cache_extents);
+
+ if (cache_extents &&
+ offset >= cache_start && offset < cache_end) {
+ nbdkit_debug ("cacheextents: returning from cache");
+ return cacheextents_add (extents, err);
+ }
+
+ nbdkit_debug ("cacheextents: cache miss");
+ /* Clearing the REQ_ONE here in order to get (possibly) more data from the
+ * plugin. This should not increase the cost as plugins can still return just
+ * one extent if it's too costly to return more. */
+ flags &= ~(NBDKIT_FLAG_REQ_ONE);
+ if (next_ops->extents (nxdata, count, offset, flags, extents, err) == -1)
+ return -1;
+
+ return cacheextents_fill (extents, err);
+}
+
+/* Any changes to the data need to clean the cache.
+ *
+ * Similarly to readahead filter this could be more intelligent, but there would
+ * be very little benefit.
+ */
+
+static void
+kill_cacheextents (void)
+++ b/filters/cacheextents/nbdkit-cacheextents-filter.pod
@@ -0,0 +1,47 @@
+=head1 NAME
+
+nbdkit-cacheextents-filter - cache extents
+
+=head1 SYNOPSIS
+
+ nbdkit --filter=cacheextents plugin
+
+=head1 DESCRIPTION
+
+C<nbdkit-cacheextents-filter> is a filter that caches the result of last
+extents() call.
+
+A common use for this filter is to accelerate returning extents data for
+clients which ask for extents with the REQ_ONE flag set (like
+S<C<qemu-img convert>>) and is especially useful in combination with
+plugins that report multiple extents in one call, but with high latency
+for each of those calls (like L<nbdkit-vddk-plugin(1)>). For example:
You asked for a re-word; how about this (with proper markup):
A common use for this filter is to improve performance when using a
client performing a linear pass over the entire image while asking for
only one extent at a time (such as qemu-img convert), but where the
plugin can provide multiple extents for the same high latency as a
single extent (such as nbdkit-vddk-plugin).
+
+ nbdkit -U - --filter=cacheextents --run 'qemu-img map $nbd' vddk ...
+
+For files with big extents (when it is unlikely for one extents() call
+to return multiple different extents) this does not slow down the
+access.
I'd also like to see a mention of nbdkit-cache-filter, probably worded as:
Note that this caches only plugin metadata; this filter can be combined
with L<nbdkit-cache-filter(1)> to also cache data.
and a parallel sentence added to nbdkit-cache-filter.pod.
diff --git a/tests/test-cacheextents.sh b/tests/test-cacheextents.sh
new file mode 100755
index 000000000000..5b0667ad2b95
--- /dev/null
+++ b/tests/test-cacheextents.sh
@@ -0,0 +1,110 @@
+requires grep --version
+requires qemu-img --version
+requires qemu-io --version
+
+sock="$(mktemp -u)"
+sockurl="nbd+unix:///?socket=$sock"
+pidfile="test-cacheextents.pid"
+accessfile="test-cacheextents-access.log"
+accessfile_full="$(pwd)/test-cacheextents-access.log"
$PWD saves a process compared to $(pwd)
+files="$pidfile $sock"
+rm -f $files $accessfile
+cleanup_fn rm -f $files
+
+# Intentionally using EOF rather than 'EOF' so that we can pass in the
+# $accessfile_full
+start_nbdkit \
+ -P $pidfile \
+ -U $sock \
+ --filter=cacheextents \
+ sh - <<EOF
+echo "Call: \$@" >>$accessfile_full
+size=4M
+block_size=\$((1024*1024))
+case "\$1" in
+ get_size) echo \$size ;;
+ can_extents) ;;
+ extents)
+ echo "extents request: \$@" >>$accessfile_full
+ offset=\$((\$4/\$block_size))
+ count=\$((\$3/\$block_size))
+ length=\$((\$offset+\$count))
Adding spaces around the operators would make this more legible.
+ for i in \$(seq \$offset \$length); do
+ echo \${i}M \$block_size \$((i%4)) >>$accessfile_full
+ echo \${i}M \$block_size \$((i%4))
+ done
+ ;;
+ pread) dd if=/dev/zero count=\$3 iflag=count_bytes ;;
+ can_write) ;;
+ pwrite) dd of=/dev/null ;;
+ can_trim) ;;
+ trim) ;;
+ can_zero) ;;
+ trim) ;;
zero)
+ *) exit 2 ;;
+esac
+EOF
+
+
+test_me() {
+ num_accesses=$1
+ shift
+
+ qemu-io -f raw "$@" "$sockurl"
+ test "$(grep -c "^extents request: " $accessfile)" -eq
"$num_accesses"
+ ret=$?
+ rm -f "$accessfile"
+ return $ret
+}
+
+# First one causes caching, the rest should be returned from cache.
+test_me 1 -c 'map' -c 'map' -c 'map'
+# First one is still cached from last time, discard should kill the cache, then
+# one request should go through.
+test_me 1 -c 'map' -c 'discard 0 1' -c 'map'
+# Same as above, only this time the cache is killed before all the operations as
+# well. This is used from now on to clear the cache as it seems nicer and
+# faster than running new nbdkit for each test.
+test_me 2 -c 'discard 0 1' -c 'map' -c 'discard 0 1' -c
'map'
+# Write should kill the cache as well.
+test_me 2 -c 'discard 0 1' -c 'map' -c 'write 0 1' -c
'map'
+# Alloc should use cached data from map
+test_me 1 -c 'discard 0 1' -c 'map' -c 'alloc 0'
+# Read should not kill the cache
+test_me 1 -c 'discard 0 1' -c 'map' -c 'read 0 1' -c
'map' -c 'alloc 0'
Looks good. Unless you have objections, I'll go ahead and make the
tweaks mentioned, and push this.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization:
qemu.org |
libvirt.org