Community edition VM on ESXi 6.7 running on Dual Processor

Before posting something, READ the changelog, WATCH the videos, howto and provide following:
Your install is: Bare metal, ESXi, what CPU model, RAM, HD, what EVE version you have, output of the uname -a and any other info that might help us faster.

Moderator: mike

mathewfer
Posts: 79
Joined: Wed May 10, 2017 12:11 pm
Contact:

Re: Community edition VM on ESXi 6.7 running on Dual Processor

Post by mathewfer » Wed Jan 31, 2024 5:53 am

Updated:

Installed ESXi 8 (ESXi-8.0.0-20513097-standard) and ran the same LAB with Cumulus, Juniper Cisco IOL, Arista and DevOps linux image.

Tested results:
Same as ESXi 7.0 and ESXi 6.7.


Further tests done and they gave some possible clues of the issue.

1. On ESXi, with "Expose IOMMU to the guest OS" enabled, original Linux devops image issues with errors like "kernel: watchdog: BUG: soft lockup - CPU#3 stuck for 55s!" is completely gone. This is an improvement running on ESXi 6.7, 7.0 and 8.0. Thanks to Uldis.
2. On ESXi, LAB with only BGP and using interfaces without any LAG/LACP, BGP peers on Cumulus, CISCO IOL and Arista are all 100% stable. For Juniper vMX (VFP & VCP) and vSRX NextGen, BGP peers without LAG/LACP are very much stable than with LACP but they still bounce every 1-2hrs randomly.
3. On bare-metal, a LAB containing, DevOps Linux image, BGP peers on Cumulus, CISCO IOL and Arista are all 100% stable with LAG/LACP interfaces and Juniper vMX and vSRX are all 100% stable with LAG/LACP, There are no issues with CPU errors or nodes freezing experiences or interface/BGP flaps.

Summary, EVE-ng on bare-metal all are 100% stable with all my LABs, especially with all the above combinations while on ESXi, LACP seems major contribution factor for the instability - nodes freezing experiences or frequent interface/BGP flaps. I see Juniper images are also not stable on ESXi on my setup.

Still, I am not 100% sure it is the server hardware I use or it is an issue with EVE-ng running on ESXi with LACP and with above combinations. Any comments are welcome.

As a solution, I have settled down with dual boot, one with ESXi 8.0 running EVE-ng CE and EVE-ng running on bare-metal on the same server. This works well with my LABing requirements.

Having said that I am still curious to find out why EVE-ng on ESXi is working this way and to understand whether it is really my server hardware or not.

Thanks again for all the feedback/comments/support from teams on this forum.

Post Reply