vSphere 6.5 guest UNMAP may cause VM I/O latency spikes

I converted some VMs to thin and upgraded VM hardware version to 13 to test out savings. Initial retrim caused transient I/O slowdown in VM but the issue kept reappearing randomly. I/O latency just spikes to 400ms for minutes for no apparent reason. It also seems to affect other surrounding VMs, just not as badly. After several days, I converted VMs back to thick and issues disappeared.

I’m not sure where the problem is and I can’t look into it anymore. Might be a bug in vSphere. Might be the IBM v7000 G2 SAN that goes crazy. As I said, I cannot investigate it any further but I’ll update the post if I ever hear anything.

PS! Savings were great, on some systems nearly 100% from VMFS perspective. On some larger VMs with possible alignment issues, reclamation takes several days though. For example, a 9TB thick file server took 3 days to shrink to 5TB.

Update 2017.o6.29:

Veeam’s (or Anton Gostev’s) newsletter mentioned a similar issue just as I came across this issue again in a new vSphere cluster. In the end VMware support confirmed the issue with expected release of 6.5 Update 1 at the end of July.

vSphere 6.5 virtual NVMe does not support TRIM/UNMAP/Deallocate

I was playing with guest TRIM/UNMAP the other day and looked at new shiny virtual NVMe controller. While it would not help much in my workloads, cutting overhead never hurts. So I tried to do “defrag /L” in VM and it return that device doesn’t support it.

So I looked up release notes. Virtual NVMe device: “Supports NVMe Specification v1.0e mandatory admin and I/O commands”.

The thing is that NVMe part that deals with Deallocate (ATA TRIM/SCSI UNMAP in NVMe-speak) is optional. So back to pvscsi for space savings…