On Sat, Jan 05, 2019 at 01:19:50PM +0000, Richard W.M. Jones wrote:
(2) I moved the locking to around calls to the sparse array code,
and
changed the thread model to parallel:
read: IOPS=112k, BW=437MiB/s (458MB/s)(51.2GiB/120001msec)
write: IOPS=112k, BW=437MiB/s (458MB/s)(51.2GiB/120001msec)
FWIW I looked into this case with perf (25 GB of event data!) and
there are no obvious candidates for optimization unfortunately. It's
about one third spent reading or writing to the Unix domain socket,
only about 4% waiting on locks, and the rest of the time either in fio
[the test program] or the kernel or doing actual work.
In other words, no easy wins ...
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html