On Thu, Nov 16, 2017 at 02:47:46PM +0000, Stefan Hajnoczi wrote:
 The threads you observed are the thread pool that performs
 preadv(2)/pwritev(2) syscalls.  The Linux AIO API could be used instead
 and does not use threads for read and write operations. 
I guess if I used AIO then I wouldn't get any parallelism at all since
Linux doesn't block on local file access (at least, it never used to)?
 Interesting.  Did you perform multiple runs of each setting to
verify
 that the benchmark results are stable with little volatility? 
I retested the -m 8 no-W/-W ones because those were so unexpected and
those are repeatable.
 Which command-line did you use to create the preallocated qcow2 file?
What I actually did was qemu-img convert -n into the existing qcow2
file, so there was no separate command for that.
 Are the source and target files on the same file system and host
block
 device?  The benefit of using multiple coroutines depends on the
 performance characteristics of the source and target files. 
Both local filesystems, but on different SATA devices.
Rich.
-- 
Richard Jones, Virtualization Group, Red Hat 
http://people.redhat.com/~rjones
Read my programming and virtualization blog: 
http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html