On Tue, Jun 11, 2019 at 08:49:37AM -0500, Eric Blake wrote:
On 6/11/19 3:49 AM, Martin Kletzander wrote:
> This filter caches the last result of the extents() call and offers a nice
> speed-up for clients that only support req_one=1 in combination with plugins
> like vddk, which has no overhead for returning information for multiple extents
> in one call, but that call is very time-consuming.
>
> Quick test showed that on a fast connection and a sparsely allocated 16G disk
> with a OS installed `qemu-img map` runs 16s instead of 33s (out of which it
> takes 5s to the first extents request). For 100G disk with no data on it, that
> is one hole extent spanning the whole disk (where there is no space for
> improvement) this does not add any noticeable overhead.
>
> Signed-off-by: Martin Kletzander <mkletzan(a)redhat.com>
> ---
>
> Notes:
> v2:
>
> - Clearing the REQ_ONE when forwarding the request to plugin
>
> - I hate the wording of the comment explaining it, please suggest a reword (or
> just just fix it)
I'll give it a shot.
>
> - Tests are now using custom shell plugin logging all accesses and also qemu-io
> testing all other expected behaviour
>
> +++ b/filters/cacheextents/cacheextents.c
> +
> +/* Cached extents from the last extents () call and its start and end for the
> + sake of simplicity. */
The prevailing style seems to be the use of leading ' *' on subsequent
comment lines, and */ on its own line.
> +struct nbdkit_extents *cache_extents;
> +static uint64_t cache_start;
> +static uint64_t cache_end;
> +
> +static void
> +cacheextents_unload (void)
> +{
> + nbdkit_extents_free (cache_extents);
> +}
> +
> +static int
> +cacheextents_add (struct nbdkit_extents *extents, int *err)
> +{
> + size_t i = 0;
> +
> + for (i = 0; i < nbdkit_extents_count (cache_extents); i++) {
> + struct nbdkit_extent ex = nbdkit_get_extent (cache_extents, i);
> + if (nbdkit_add_extent (extents, ex.offset, ex.length, ex.type) == -1) {
> + *err = errno;
> + return -1;
> + }
> + }
> +
> + return 0;
> +}
> +
> +static int
> +cacheextents_fill (struct nbdkit_extents *extents, int *err)
> +{
> + size_t i = 0;
> + size_t count = nbdkit_extents_count (extents);
> + struct nbdkit_extent first = nbdkit_get_extent (extents, 0);
> + struct nbdkit_extent last = nbdkit_get_extent (extents, count - 1);
> +
> + nbdkit_extents_free (cache_extents);
> + cache_start = first.offset;
> + cache_end = last.offset + last.length;
> + cache_extents = nbdkit_extents_new (cache_start, cache_end);
Needs a check for failure.
Hmm - my earlier comments about appending to the cache are hard if we
cap it to cache_end; I may still play with tweaking the cache algorithm
to allow appending, but it would be a separate patch on top.
Well, it just means that if this is about to be rewritten, then it cannot use
the nbd_extents struct provided and instead an internal one (that would allow
growing in both directions, maybe also being non-contiguous) would be used here.
> +
> + for (i = 0; i < count; i++) {
> + struct nbdkit_extent ex = nbdkit_get_extent (extents, i);
> + nbdkit_debug ("cacheextents: updating cache with"
> + ": offset=%" PRIu64
> + ": length=%" PRIu64
> + "; type=%x",
Punctuation looks a bit funny; I'll probably normalize to
Really weird leftover, thanks for noticing
updating cache with: offset=512 length=512 type=0
to match...
> + ex.offset, ex.length, ex.type);
> + if (nbdkit_add_extent (cache_extents, ex.offset, ex.length, ex.type) == -1) {
> + *err = errno;
> + return -1;
> + }
> + }
> +
> + return 0;
> +}
> +
> +static int
> +cacheextents_extents (struct nbdkit_next_ops *next_ops, void *nxdata,
> + void *handle, uint32_t count, uint64_t offset, uint32_t
flags,
> + struct nbdkit_extents *extents,
> + int *err)
> +{
> + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock);
> +
> + nbdkit_debug ("cacheextents:"
> + " cache_start=%" PRIu64
> + " cache_end=%" PRIu64
> + " cache_extents=%p",
...this style.
> + cache_start, cache_end, cache_extents);
> +
> + if (cache_extents &&
> + offset >= cache_start && offset < cache_end) {
> + nbdkit_debug ("cacheextents: returning from cache");
> + return cacheextents_add (extents, err);
> + }
> +
> + nbdkit_debug ("cacheextents: cache miss");
> + /* Clearing the REQ_ONE here in order to get (possibly) more data from the
> + * plugin. This should not increase the cost as plugins can still return just
> + * one extent if it's too costly to return more. */
> + flags &= ~(NBDKIT_FLAG_REQ_ONE);
> + if (next_ops->extents (nxdata, count, offset, flags, extents, err) == -1)
> + return -1;
> +
> + return cacheextents_fill (extents, err);
> +}
> +
> +/* Any changes to the data need to clean the cache.
> + *
> + * Similarly to readahead filter this could be more intelligent, but there would
> + * be very little benefit.
> + */
> +
> +static void
> +kill_cacheextents (void)
> +++ b/filters/cacheextents/nbdkit-cacheextents-filter.pod
> @@ -0,0 +1,47 @@
> +=head1 NAME
> +
> +nbdkit-cacheextents-filter - cache extents
> +
> +=head1 SYNOPSIS
> +
> + nbdkit --filter=cacheextents plugin
> +
> +=head1 DESCRIPTION
> +
> +C<nbdkit-cacheextents-filter> is a filter that caches the result of last
> +extents() call.
> +
> +A common use for this filter is to accelerate returning extents data for
> +clients which ask for extents with the REQ_ONE flag set (like
> +S<C<qemu-img convert>>) and is especially useful in combination with
> +plugins that report multiple extents in one call, but with high latency
> +for each of those calls (like L<nbdkit-vddk-plugin(1)>). For example:
You asked for a re-word; how about this (with proper markup):
A common use for this filter is to improve performance when using a
client performing a linear pass over the entire image while asking for
only one extent at a time (such as qemu-img convert), but where the
plugin can provide multiple extents for the same high latency as a
single extent (such as nbdkit-vddk-plugin).
Yes, this reads better (although the one I really wanted to reword was the comment in
cacheextents_extents():
+ /* Clearing the REQ_ONE here in order to get (possibly) more data from the
+ * plugin. This should not increase the cost as plugins can still return just
+ * one extent if it's too costly to return more. */
+ flags &= ~(NBDKIT_FLAG_REQ_ONE);
...
> +
> + nbdkit -U - --filter=cacheextents --run 'qemu-img map $nbd' vddk ...
> +
> +For files with big extents (when it is unlikely for one extents() call
> +to return multiple different extents) this does not slow down the
> +access.
I'd also like to see a mention of nbdkit-cache-filter, probably worded as:
Note that this caches only plugin metadata; this filter can be combined
with L<nbdkit-cache-filter(1)> to also cache data.
and a parallel sentence added to nbdkit-cache-filter.pod.
and nbdkit-filters, yes, I forgot to do this even though you already pointed
that out in the previous version. I just got to this after too long.
> diff --git a/tests/test-cacheextents.sh b/tests/test-cacheextents.sh
> new file mode 100755
> index 000000000000..5b0667ad2b95
> --- /dev/null
> +++ b/tests/test-cacheextents.sh
> @@ -0,0 +1,110 @@
> +requires grep --version
> +requires qemu-img --version
> +requires qemu-io --version
> +
> +sock="$(mktemp -u)"
> +sockurl="nbd+unix:///?socket=$sock"
> +pidfile="test-cacheextents.pid"
> +accessfile="test-cacheextents-access.log"
> +accessfile_full="$(pwd)/test-cacheextents-access.log"
$PWD saves a process compared to $(pwd)
> +files="$pidfile $sock"
> +rm -f $files $accessfile
> +cleanup_fn rm -f $files
> +
> +# Intentionally using EOF rather than 'EOF' so that we can pass in the
> +# $accessfile_full
> +start_nbdkit \
> + -P $pidfile \
> + -U $sock \
> + --filter=cacheextents \
> + sh - <<EOF
> +echo "Call: \$@" >>$accessfile_full
> +size=4M
> +block_size=\$((1024*1024))
> +case "\$1" in
> + get_size) echo \$size ;;
> + can_extents) ;;
> + extents)
> + echo "extents request: \$@" >>$accessfile_full
> + offset=\$((\$4/\$block_size))
> + count=\$((\$3/\$block_size))
> + length=\$((\$offset+\$count))
Adding spaces around the operators would make this more legible.
> + for i in \$(seq \$offset \$length); do
> + echo \${i}M \$block_size \$((i%4)) >>$accessfile_full
> + echo \${i}M \$block_size \$((i%4))
> + done
> + ;;
> + pread) dd if=/dev/zero count=\$3 iflag=count_bytes ;;
> + can_write) ;;
> + pwrite) dd of=/dev/null ;;
> + can_trim) ;;
> + trim) ;;
> + can_zero) ;;
> + trim) ;;
zero)
> + *) exit 2 ;;
> +esac
> +EOF
> +
> +
> +test_me() {
> + num_accesses=$1
> + shift
> +
> + qemu-io -f raw "$@" "$sockurl"
> + test "$(grep -c "^extents request: " $accessfile)" -eq
"$num_accesses"
> + ret=$?
> + rm -f "$accessfile"
> + return $ret
> +}
> +
> +# First one causes caching, the rest should be returned from cache.
> +test_me 1 -c 'map' -c 'map' -c 'map'
> +# First one is still cached from last time, discard should kill the cache, then
> +# one request should go through.
> +test_me 1 -c 'map' -c 'discard 0 1' -c 'map'
> +# Same as above, only this time the cache is killed before all the operations as
> +# well. This is used from now on to clear the cache as it seems nicer and
> +# faster than running new nbdkit for each test.
> +test_me 2 -c 'discard 0 1' -c 'map' -c 'discard 0 1' -c
'map'
> +# Write should kill the cache as well.
> +test_me 2 -c 'discard 0 1' -c 'map' -c 'write 0 1' -c
'map'
> +# Alloc should use cached data from map
> +test_me 1 -c 'discard 0 1' -c 'map' -c 'alloc 0'
> +# Read should not kill the cache
> +test_me 1 -c 'discard 0 1' -c 'map' -c 'read 0 1' -c
'map' -c 'alloc 0'
>
Looks good. Unless you have objections, I'll go ahead and make the
tweaks mentioned, and push this.
Yes, all of them make sense, thank you very much.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization:
qemu.org |
libvirt.org