Re: [Libguestfs] FYI: perf commands I'm using to benchmark nbdcopy

Wednesday, 26 May 2021

On Wed, May 26, 2021 at 04:49:50PM +0300, Nir Soffer wrote:
...
 On Wed, May 26, 2021 at 4:03 PM Richard W.M. Jones
<rjones(a)redhat.com&gt; wrote:
 > In my testing, nbdcopy is a clear 4x faster than qemu-img convert, with
 > 4 also happening to be the default number of connections/threads.
 > Why use nbdcopy --connections=1?  That completely disables threads in
 > nbdcopy.

 Because qemu-nbd does not report multicon when writing, so practically
 you get one nbd handle for writing. 
Let's see if we can fix that.  Crippling nbdcopy because of a missing
feature in qemu-nbd isn't right.  I wonder what Eric's reasoning for
multi-conn not being safe is.

...
 > Also I'm not sure if --flush is fair (it depends on what
 > qemu-img does, which I don't know).

 qemu is flushing at the end of the operation. Not flushing is cheating :-) 
That's fair enough.  I will add that flag to my future tests.

I also pushed these commits to disable malloc checking outside tests:

  https://gitlab.com/nbdkit/libnbd/-/commit/88e72dcb1631b315957f5f98e3cdfcd...
  https://gitlab.com/nbdkit/nbdkit/-/commit/6039780f3bb0617650fa1fa4c1399b0...

...
 > The other interesting things are the qemu-img convert flags
you're using:
 >
 >  -m 16  number of coroutines, default is 8

 We use 8 in RHV since the difference is very small, and when running
 concurrent copies it does not matter. Since we use up to 64 concurrent
 requests in nbdcopy, it is useful to compare similar setup in qemu. 
I'm not really clear on the relationship (in qemu-img) between number
of coroutines, number of pthreads and number of requests in flight.
At this rate I'm going to have to look at the source :-)

...
 >  -W     out of order writes, but the manual says "This is
only recommended
 >         for preallocated devices like host devices or other raw block
 >         devices" which is a very unclear recommendation to me.
 >         What's special about host devices versus (eg) files or
 >         qcow2 files which means -W wouldn't always be recommended?

 This is how RHV use qemu-img convert when copying to raw preallocated
 volumes. Using -W  can be up to 6x times faster. We use the same for imageio
 for any type of disk. This is the reason I tested this way.

 -W is equivalent to the nbdocpy multithreaded copy using a single connection.

 qemu-img does N concurrent reads. If you don't specify -W, it writes
 the data in the right order (based on offset). If a read did not
 finish, the copy loops waits until the read complets before
 writing. This ensure exactly one concurrent write, and it is much
 slower. 
Thanks - interesting.  Still not sure why you wouldn't want to use
this flag all the time.

See also:
https://lists.nongnu.org/archive/html/qemu-discuss/2021-05/msg00070.html

...
...
 This shows that nbdcopy works better when the latency is
 (practically) zero, copying data from memory to memory. This is
 useful for minimizing overhead in nbdcopy, but when copying real
 images with real storage with direct I/O the time to write the data
 to storage hides everything else.

 Would it be useful to add latency in the sparse-random plugin, so it
 behaves more like real storage? (or maybe it is already possible
 with a filter?) 
We could use one of these filters:
https://libguestfs.org/nbdkit-delay-filter.1.html
https://libguestfs.org/nbdkit-rate-filter.1.html

Something like "--filter=delay wdelay=1ms" might be more realistic.
To simulate NVMe we might need to be able to specify microseconds there.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] FYI: perf commands I'm using to benchmark nbdcopy