On Mon, Apr 15, 2019 at 03:49:05PM +0100, Richard W.M. Jones wrote:
On Mon, Apr 15, 2019 at 04:30:47PM +0200, Martin Kletzander wrote:
> On Fri, Apr 12, 2019 at 04:30:02PM +0100, Richard W.M. Jones wrote:
> > +-- nbdkit monitoring process
> > |
> > +-- first child = nbdkit
> > |
> > +-- second child = ‘--run’ command
> >
> >so when the second child exits, the monitoring process (which is doing
> >nothing except waiting for the second child to exit) can kill nbdkit.
> >
>
> Oh, I thought the "monitoring process" would just be a signal
> handler. If the monitoring process is just checking those two
> underlying ones, how come the PID changes for the APIs? Is the Init
> called before the first child forks off?
Right, for convenience reasons the configuration steps (ie. .config,
.config_complete in [1]) are done before we fork either to act as a
server or to run commands, and the VDDK plugin does the initialization
in .config_complete which is the only sensible place to do it.
While this is specific to using the --run option, it would also I
assume happen if nbdkit forks into the background to become a server.
But if you run nbdkit without --run and with --foreground then it
remains in the foreground and the hang doesn't occur.
Yes, also, the delay I noticed was amplified by the req_one from qemu-img.
Since I am testing this on 100G file, there are 50 requests for extents to check
the allocation size of the image and then another 50 requests when actually
"copying the data". I changed the script to use --exit-with-parent and it
still
takes significant amount of time. Although it's roughly 2 minutes faster ;)
[1]
https://github.com/libguestfs/nbdkit/blob/master/docs/nbdkit-plugin.pod
> >If VDDK cannot handle this situation (and I'm just guessing that this
> >is the bug) then VDDK has a bug.
> >
>
> Sure, but having a workaround could be nice, if it's not too much work.
Patches welcome, but I suspect there's not a lot we can do in nbdkit
> >>>(3) Using nbdkit-noextents-filter and nbdkit-stats-filter we can
> >>>nicely measure the benefits of extents:
> >>>
> >>>With noextents (ie. force full copy):
> >>>
> >>> elapsed time: 323.815 s
> >>> read: 8194 ops, 17179869696 bytes, 4.24437e+08 bits/s
> >>>
> >>>Without noextents (ie. rely on qemu-img skipping sparse bits):
> >>>
> >>> elapsed time: 237.41 s
> >>> read: 833 ops, 1734345216 bytes, 5.84423e+07 bits/s
> >>> extents: 70 ops, 135654246400 bytes, 4.57114e+09 bits/s
> >>>
> >>>Note if you deduct 120 seconds (see point (1) above) from these times
> >>>then it goes from 203s -> 117s, about a 40% saving. We can likely do
> >>>better by having > 32 bit requests and qemu not using
> >>>NBD_CMD_FLAG_REQ_ONE.
> >>>
> >>How did you run qemu-img?
> >
> >The full command was:
> >
> >LD_LIBRARY_PATH=vmware-vix-disklib-distrib/lib64 \
> >./nbdkit -r -U - vddk file="[datastore1] Fedora 28/Fedora 28.vmdk" \
> > libdir=vmware-vix-disklib-distrib \
> > server=vmware user=root password=+/tmp/passwd \
> > thumbprint=xyz \
> > vm=moref=3 \
> > --filter=stats statsfile=/dev/stderr \
> > --run '
> > unset LD_LIBRARY_PATH
> > /home/rjones/d/qemu/qemu-img convert -p $nbd /var/tmp/out
> > '
> >
> >(with extra filters added to the command line as appropriate for each
> >test).
> >
> >>I think on slow CPU and fast disk this might be even bigger
> >>difference if qemu-img can write whatever it gets and not searching
> >>for zeros.
> >
> >This is RHEL 8 so /var/tmp is XFS. The hardware is relatively new and
> >the disk is an SSD.
> >
>
> Why I'm asking is because what you are measuring above still
> includes QEMU looking for zero blocks in the data. I haven't found
> a way to make qemu write the sparse data it reads without explicitly
> sparsifying even more by checking for zeros and not creating a fully
> allocated image.
While qemu-img is still trying to detect zeroes, it won't find too
many because the image is thin provisioned. However I take your point
that when copying a snapshot using the "single link" flag you don't
want qemu-img to do this because that means it may omit parts of the
snapshot that happen to be zero. It would still be good to see the
output of ‘qemu-img map --output=json’ to see if qemu is really
sparsifying the zeroes or is actually writing them as zero non-holes
(which is IMO correct behaviour and shouldn't cause any problem).
I *thought* it is not writing them as zero data, nor punching the holes. I
tried with both raw and qcow2 images (with options -n -W -C and combinations).
And then realized that the single-link patch is incomplete, so it read some more
zeroes than it actually should. That means it might just work, but I need to
finish the patch and test it out. And each test takes some infuriating time.
Not that it takes *so* long, but waiting just to see that it failed is a bad
enough experience on its own.
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html