[Libguestfs] FYI: perf commands I'm using to benchmark nbdcopy

Tuesday, 25 May 2021

Hi Abhay,

FYI I thought I would document the successful commands I am using to
benchmark nbdcopy and produce the flame graphs that you saw this
morning.  Attached is a very recent flame graph produced using this
method.

Firstly I'm running everything on Fedora 34, with selected packages
upgraded to Fedora Rawhide.  However any reasonably recent Linux
distro should work fine.  You will need to install the perf tool.

Compile libnbd & nbdkit from git source, following the instructions in
the respective README files.

  https://gitlab.com/nbdkit/libnbd
  https://gitlab.com/nbdkit/nbdkit

I have nbdkit and libnbd checked out in adjacent directories.  This is
important so that commands like "./nbdkit" and "../libnbd/run
nbdcopy"
work.  There's more information about this in the READMEs.

I ran perf as below.  Although nbdcopy and nbdkit themselves do not
require root (and usually should _not_ be run as root), in this case
perf must be run as root, so everything has to be run as root.

  # perf record -a -g --call-graph=dwarf ./nbdkit -U - sparse-random size=1T --run
"MALLOC_CHECK_= ../libnbd/run nbdcopy \$uri \$uri"

Some things to explain:

 * The output is perf.data in the local directory.  This file may be
   huge (22GB for me!)

 * I am running this from the nbdkit directory, so ./nbdkit runs the
   locally compiled copy of nbdkit.  This allows me to make quick
   changes to nbdkit and see the effects immediately.

 * I am running nbdcopy using "../libnbd/run nbdcopy", so that's from
   the adjacent locally compiled libnbd directory.  Again the reason
   for this is so I can make changes, recompile libnbd, and see the
   effect quickly.

 * "MALLOC_CHECK_=" is needed because of complicated reasons to do
   with how the nbdkit wrapper enables malloc-checking.  We should
   probably provide a way to disable malloc-checking when benchmarking
   because it adds overhead for no benefit, but I've not done that yet
   (patches welcome!)

 * The test harness is nbdkit-sparse-random-plugin, documented here:
   https://libguestfs.org/nbdkit-sparse-random-plugin.1.html

 * I'm using DWARF debugging info to generate call stacks, which is
   more reliable than the default (frame pointers).

 * The -a option means I'm measuring events on the whole machine.  You
   can read the perf manual to find out how to measure only a single
   process (eg. just nbdkit or just nbdcopy).  But actually measuring
   the whole machine gives a truer picture, I believe.

 * If the test takes too long to run or runs out of space, try
   adjusting the size (1T = 1 terabyte) downwards, eg. 512G, 256G, ...
   until it fits.  Although nbdkit doesn't store the virtual disk or
   use very much memory at all, the test does appear to stress the
   Linux VMM, and the amount of perf.data generated can be huge.

Then I run this long command to generate the flame graph.  Again
it must be run as root:

  # perf script | ../FlameGraph/stackcollapse-perf.pl | ../FlameGraph/flamegraph.pl >
nbdcopy.svg

 * This reads perf.data as input.

 * Brendan Gregg's FlameGraph code is checked out in another adjacent
   directory.

You can open the SVG file in a web browser.  Try clicking around -
it's interactive.

If you get stuck, ask questions, we're here to help.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[Libguestfs] FYI: perf commands I'm using to benchmark nbdcopy