[Adding Paolo and Vitaly, but FYI only as the bug seems to have an
upstream fix already.]
On Mon, Mar 26, 2018 at 09:13:45AM +0300, Roman Kagan wrote:
> On Sat, Mar 24, 2018 at 03:11:12PM +0000, Richard W.M. Jones wrote:
> > On Sat, Mar 24, 2018 at 03:08:16PM +0000, Tanmoy Sinha wrote:
> > > Even though force_tcg works, I intend not to run it on emulation. Is there
> > > way I can run it over kvm? The other observation is, without force_tcg if I
> > > use the machine type as *pc-i440fx-2.**1*,accel=kvm it works fine. The
> > > default machine type for my host *pc-i440fx-2.8, *which seems to crib.
> >
> > I don't know, but this is basically a bug in VMware, so you need
> > to ask them to fix their nested KVM-on-ESXi use case.
>
> We've encountered this problem, too.
>
> Strictly speaking, the bug is not in VMWare, but rather in KVM: on
> EPT_MISCONFIG vmexits it assumed the processor to set the instuction
> length field. This wasn't mandated by the spec but the real processors
> did that. OTOH some hypervisors (VMWare, Hyper-V) didn't do that for
> the nested hypervisor. As a result, when handling MMIO the guest
> instruction pointer didn't get advanced, i.e. the guest got stuck in an
> infinite loop.
>
> The difference between the old and the new machine type is that the
> latter turns on newer virtio protocol version employing MMIO, exposing
> this bug.
>
> The fix is commit d391f1207067268261add0485f0f34503539c5b0 which went
> into 4.16-rc1.
Can you (Tanmoy) please try a newer kernel inside the VMware guest?
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines. Supports shell scripting,
bindings from many languages. http://libguestfs.org