vSphere 6.5 and 6.7 qfle3 driver is really unstable

Edit 2018.07.11

HPE support confirmed that qfle3 bundle is dead in water. Our VAR was astonished that sales branch was completely unaware of severe stability issues. Edited subject to reflect findings.

Edit 2018.07.09

Qlogic qfle3i (and whole Qlogic 57810 driver bundle) seems to be just fucked. qfle3i crashes on no matter what. Even basic NIC driver qfle3 crashes occasionally. So if you’re planning to switch from bnx2 to qfle3 as required by HPE, don’t! bnx2 is at least stable for now. Latest HPE images already contain this fix – however it doesn’t fix these specific crashes. VMware support also confirmed that there’s an ongoing investigation into this known common issue and it also affects vSphere 6.5. I’m suffering on HPE 534FLR-SFP+ adapters but your OEM may have other names for Qlogic/Cavium/Broadcom 57810 chipset.

A few days ago I was setting up a new green-field VMware deployment. As a team effort, we were ironing out configuration bugs and oversights, but all despite all the fixes, vSpheres kept PSODing consistently. Stack showed crashes in Qlogic hardware iSCSI adapter driver qfle3i.

Firmwares were updated and updates were installed, to no effect. After looking around and trial-and-errors, one fiber cable turned out to be faulty and caused occasional packet loss on SAN to switch path. TCP is supposed to fix that in theory but hardware adapters seem to be much more picky. Monitoring was not yet configured so it was quite annoying to track down. Also as SAN was not properly accessible, no persistent storage for logs nor dumps.

So if you’re using hardware adapters and seeing PSODs, check for packet loss in switches. I won’t engage support for this as I have no logs nor dumps. But if you see “qfle3i_tear_down_conn” in PSOD, look for Ethernet problems.