thanks Richard,

The experiment was indeed done with nested VM enabled. I am not sure about the internals, but i thought once overlay is setup the 2 main processes are sshd and qemu-img convert (reading data from sshd and doing the conversion)
I don't see any of the qemu process running.
Initial overlay setup was pretty quick and rest of the time was spent in qemu-img convert operation


I think this is just KVM vs TCG?  You could try enabling nested KVM to
see if that makes things faster, but it very much depends on your host


