Re: [Libguestfs] [nbdkit PATCH v2] Introduce cacheextents filter

Tuesday, 11 June 2019

On Tue, Jun 11, 2019 at 08:49:37AM -0500, Eric Blake wrote:
...
On 6/11/19 3:49 AM, Martin Kletzander wrote:
> This filter caches the last result of the extents() call and offers a nice
> speed-up for clients that only support req_one=1 in combination with plugins
> like vddk, which has no overhead for returning information for multiple extents
> in one call, but that call is very time-consuming.
>
> Quick test showed that on a fast connection and a sparsely allocated 16G disk
> with a OS installed `qemu-img map` runs 16s instead of 33s (out of which it
> takes 5s to the first extents request).  For 100G disk with no data on it, that
> is one hole extent spanning the whole disk (where there is no space for
> improvement) this does not add any noticeable overhead.
>
> Signed-off-by: Martin Kletzander <mkletzan(a)redhat.com&gt;
> ---
>
> Notes:
>     v2:
>
>      - Clearing the REQ_ONE when forwarding the request to plugin
>
>      - I hate the wording of the comment explaining it, please suggest a reword (or
>        just just fix it)

I'll give it a shot.

>
>      - Tests are now using custom shell plugin logging all accesses and also qemu-io
>        testing all other expected behaviour
>

> +++ b/filters/cacheextents/cacheextents.c

> +
> +/* Cached extents from the last extents () call and its start and end for the
> +   sake of simplicity. */

The prevailing style seems to be the use of leading ' *' on subsequent
comment lines, and */ on its own line.

> +struct nbdkit_extents *cache_extents;
> +static uint64_t cache_start;
> +static uint64_t cache_end;
> +
> +static void
> +cacheextents_unload (void)
> +{
> +  nbdkit_extents_free (cache_extents);
> +}
> +
> +static int
> +cacheextents_add (struct nbdkit_extents *extents, int *err)
> +{
> +  size_t i = 0;
> +
> +  for (i = 0; i < nbdkit_extents_count (cache_extents); i++) {
> +    struct nbdkit_extent ex = nbdkit_get_extent (cache_extents, i);
> +    if (nbdkit_add_extent (extents, ex.offset, ex.length, ex.type) == -1) {
> +      *err = errno;
> +      return -1;
> +    }
> +  }
> +
> +  return 0;
> +}
> +
> +static int
> +cacheextents_fill (struct nbdkit_extents *extents, int *err)
> +{
> +  size_t i = 0;
> +  size_t count = nbdkit_extents_count (extents);
> +  struct nbdkit_extent first = nbdkit_get_extent (extents, 0);
> +  struct nbdkit_extent last = nbdkit_get_extent (extents, count - 1);
> +
> +  nbdkit_extents_free (cache_extents);
> +  cache_start = first.offset;
> +  cache_end = last.offset + last.length;
> +  cache_extents = nbdkit_extents_new (cache_start, cache_end);

Needs a check for failure.

Hmm - my earlier comments about appending to the cache are hard if we
cap it to cache_end; I may still play with tweaking the cache algorithm
to allow appending, but it would be a separate patch on top.

Well, it just means that if this is about to be rewritten, then it cannot use
the nbd_extents struct provided and instead an internal one (that would allow
growing in both directions, maybe also being non-contiguous) would be used here.

...
> +
> +  for (i = 0; i < count; i++) {
> +    struct nbdkit_extent ex = nbdkit_get_extent (extents, i);
> +    nbdkit_debug ("cacheextents: updating cache with"
> +                  ": offset=%" PRIu64
> +                  ": length=%" PRIu64
> +                  "; type=%x",

Punctuation looks a bit funny; I'll probably normalize to

Really weird leftover, thanks for noticing

...
updating cache with: offset=512 length=512 type=0

to match...

> +                  ex.offset, ex.length, ex.type);
> +    if (nbdkit_add_extent (cache_extents, ex.offset, ex.length, ex.type) == -1) {
> +      *err = errno;
> +      return -1;
> +    }
> +  }
> +
> +  return 0;
> +}
> +
> +static int
> +cacheextents_extents (struct nbdkit_next_ops *next_ops, void *nxdata,
> +                      void *handle, uint32_t count, uint64_t offset, uint32_t
flags,
> +                      struct nbdkit_extents *extents,
> +                      int *err)
> +{
> +  ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock);
> +
> +  nbdkit_debug ("cacheextents:"
> +                " cache_start=%" PRIu64
> +                " cache_end=%" PRIu64
> +                " cache_extents=%p",

...this style.

> +                cache_start, cache_end, cache_extents);
> +
> +  if (cache_extents &&
> +      offset >= cache_start && offset < cache_end) {
> +    nbdkit_debug ("cacheextents: returning from cache");
> +    return cacheextents_add (extents, err);
> +  }
> +
> +  nbdkit_debug ("cacheextents: cache miss");
> +  /* Clearing the REQ_ONE here in order to get (possibly) more data from the
> +   * plugin.  This should not increase the cost as plugins can still return just
> +   * one extent if it's too costly to return more. */
> +  flags &= ~(NBDKIT_FLAG_REQ_ONE);
> +  if (next_ops->extents (nxdata, count, offset, flags, extents, err) == -1)
> +    return -1;
> +
> +  return cacheextents_fill (extents, err);
> +}
> +
> +/* Any changes to the data need to clean the cache.
> + *
> + * Similarly to readahead filter this could be more intelligent, but there would
> + * be very little benefit.
> + */
> +
> +static void
> +kill_cacheextents (void)

> +++ b/filters/cacheextents/nbdkit-cacheextents-filter.pod
> @@ -0,0 +1,47 @@
> +=head1 NAME
> +
> +nbdkit-cacheextents-filter - cache extents
> +
> +=head1 SYNOPSIS
> +
> + nbdkit --filter=cacheextents plugin
> +
> +=head1 DESCRIPTION
> +
> +C<nbdkit-cacheextents-filter> is a filter that caches the result of last
> +extents() call.
> +
> +A common use for this filter is to accelerate returning extents data for
> +clients which ask for extents with the REQ_ONE flag set (like
> +S<C<qemu-img convert>>) and is especially useful in combination with
> +plugins that report multiple extents in one call, but with high latency
> +for each of those calls (like L<nbdkit-vddk-plugin(1)>).  For example:

You asked for a re-word; how about this (with proper markup):

A common use for this filter is to improve performance when using a
client performing a linear pass over the entire image while asking for
only one extent at a time (such as qemu-img convert), but where the
plugin can provide multiple extents for the same high latency as a
single extent (such as nbdkit-vddk-plugin).

Yes, this reads better (although the one I really wanted to reword was the comment in
cacheextents_extents():

+  /* Clearing the REQ_ONE here in order to get (possibly) more data from the
+   * plugin.  This should not increase the cost as plugins can still return just
+   * one extent if it's too costly to return more. */
+  flags &= ~(NBDKIT_FLAG_REQ_ONE);

...

...
> +
> + nbdkit -U - --filter=cacheextents --run 'qemu-img map $nbd' vddk ...
> +
> +For files with big extents (when it is unlikely for one extents() call
> +to return multiple different extents) this does not slow down the
> +access.

I'd also like to see a mention of nbdkit-cache-filter, probably worded as:

Note that this caches only plugin metadata; this filter can be combined
with L<nbdkit-cache-filter(1)> to also cache data.

and a parallel sentence added to nbdkit-cache-filter.pod.

and nbdkit-filters, yes, I forgot to do this even though you already pointed
that out in the previous version.  I just got to this after too long.

...

> diff --git a/tests/test-cacheextents.sh b/tests/test-cacheextents.sh
> new file mode 100755
> index 000000000000..5b0667ad2b95
> --- /dev/null
> +++ b/tests/test-cacheextents.sh
> @@ -0,0 +1,110 @@

> +requires grep --version
> +requires qemu-img --version
> +requires qemu-io --version
> +
> +sock="$(mktemp -u)"
> +sockurl="nbd+unix:///?socket=$sock"
> +pidfile="test-cacheextents.pid"
> +accessfile="test-cacheextents-access.log"
> +accessfile_full="$(pwd)/test-cacheextents-access.log"

$PWD saves a process compared to $(pwd)

> +files="$pidfile $sock"
> +rm -f $files $accessfile
> +cleanup_fn rm -f $files
> +
> +# Intentionally using EOF rather than 'EOF' so that we can pass in the
> +# $accessfile_full
> +start_nbdkit \
> +    -P $pidfile \
> +    -U $sock \
> +    --filter=cacheextents \
> +    sh - <<EOF
> +echo "Call: \$@" >>$accessfile_full
> +size=4M
> +block_size=\$((1024*1024))
> +case "\$1" in
> +  get_size) echo \$size ;;
> +  can_extents) ;;
> +  extents)
> +    echo "extents request: \$@" >>$accessfile_full
> +    offset=\$((\$4/\$block_size))
> +    count=\$((\$3/\$block_size))
> +    length=\$((\$offset+\$count))

Adding spaces around the operators would make this more legible.

> +    for i in \$(seq \$offset \$length); do
> +      echo \${i}M \$block_size \$((i%4)) >>$accessfile_full
> +      echo \${i}M \$block_size \$((i%4))
> +    done
> +    ;;
> +  pread) dd if=/dev/zero count=\$3 iflag=count_bytes ;;
> +  can_write) ;;
> +  pwrite) dd of=/dev/null ;;
> +  can_trim) ;;
> +  trim) ;;
> +  can_zero) ;;
> +  trim) ;;

zero)

> +  *) exit 2 ;;
> +esac
> +EOF
> +
> +
> +test_me() {
> +    num_accesses=$1
> +    shift
> +
> +    qemu-io -f raw "$@" "$sockurl"
> +    test "$(grep -c "^extents request: " $accessfile)" -eq
"$num_accesses"
> +    ret=$?
> +    rm -f "$accessfile"
> +    return $ret
> +}
> +
> +# First one causes caching, the rest should be returned from cache.
> +test_me 1 -c 'map' -c 'map' -c 'map'
> +# First one is still cached from last time, discard should kill the cache, then
> +# one request should go through.
> +test_me 1 -c 'map' -c 'discard 0 1' -c 'map'
> +# Same as above, only this time the cache is killed before all the operations as
> +# well.  This is used from now on to clear the cache as it seems nicer and
> +# faster than running new nbdkit for each test.
> +test_me 2 -c 'discard 0 1' -c 'map' -c 'discard 0 1' -c
'map'
> +# Write should kill the cache as well.
> +test_me 2 -c 'discard 0 1' -c 'map' -c 'write 0 1' -c
'map'
> +# Alloc should use cached data from map
> +test_me 1 -c 'discard 0 1' -c 'map' -c 'alloc 0'
> +# Read should not kill the cache
> +test_me 1 -c 'discard 0 1' -c 'map' -c 'read 0 1' -c
'map' -c 'alloc 0'
>

Looks good. Unless you have objections, I'll go ahead and make the
tweaks mentioned, and push this.

Yes, all of them make sense, thank you very much.

...
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] [nbdkit PATCH v2] Introduce cacheextents filter