Windows 2016 deduplication data corruption with huge files – fixed in KB3216755

Update 27.01.2017: Corruption issue has been fixed in KB3216755 and newer!

Pre-emptive remark! Currently NTFS dedup supports up to 1TB files, anything beyond that may or (in this case) may not work. Windows Server 2016 is not generally available so any bugs may be fixed by GA.

First, some background and info on 2012R2 dedup.

As known, WS2016 RTM bits are already out there (same version as Windows 10 1607). Deduplication is supposed to be much better this time. It wasn’t too bad in WS2012R2 but optimization was slow on big volumes or with lots of modified data.

File size limit is still 1TB though. Why would you ever want to go higher? Image-based backup such as Veeam. Big VMs result in multi-TB files. Storage may be cheap but it’s not that cheap. NTFS dedup will still allow you to reduce storage costs, even when using cheap solutions.

WS2012R2 could go much higher in practice if you followed some guidelines. Main limitation is not dedup engine itself but NTFS that has limitations on how fragmented big files can become. Fragmentation causes number of file extents to increase. If you hit File Record limitations, file can no longer grow. Deduped size on disk is often reported as zero bytes but behind the scenes, reparse point seems to hold all the metadata to rebuild the file. It is also likely that modified data gets stored in reparse point until optimization. I’m not 100% sure about that but you can make some reasonable guesses from this article The Four Stages of NTFS File Growth, Part 2 and real life behavior.

Anyways, guidelines

  • Format NTFS with Large File Records. This is IMHO critical for any NTFS volume containing big (dozens of GiB) files, especially if using NTFS compression (yeah, it actually is a good idea… in some rare scenarios) and/or 4KiB clusters. In this case it allows for file record to be bigger before hitting fragmentation limits.
  • Format NTFS with 64KiB cluster size. Might increase performance but mainly helps to reduce amount of metadata per file, again allowing to keep NTFS limitations at bay.
  • Write files out in one run and never modify them later. Optimizing huge files is much slower anyways, reprocessing them is even slower and a good way to fragment your data.
  • Defragment file system. This will reduce amount of extents, allowing bigger files.
  • Apply registry flag EnablePriorityOptimization. This causes Dedup engine to aggressively defrag fragmented files.
  • Keep reasonable free space (20%), with thin provision if possible. Again, to keep fragmentation under control as NTFS tends to heavily fragment when free disk space runs low. Thin provisioning allows to not waste actual disk space.

Some optional stuff

  • Do free space consolidation in addition to usual defrag. This will reduce fragmentation of future data.
  • Use contig.exe from SysInternals. This is basically per-file defrag. This may help if you do hit limitations. Just make sure you have enough free space and it is more-less continuous.

With care, you could easily get 4TB files and even more. I’ve tested with 6TB files but processing gets sloooooooow (2 days at least) due to single-threaded nature of dedup. If you do want to run it this way, select a CPU with high clock rate, additional cores won’t help you (unless you have a lot of volumes). If you look around forums (mainly Veeam), you’ll see people using 4TB files in production. Again, YMMV, it’s unsupported but it works with some care.

There are also some forum threads with information from product team. Supposedly WS2016 will skip processing data over 4TB. And WS2012R2 dedup will work fine with 1TB+ files if you are careful. But unofficial information is unofficial.

So WS2016 is supposed to be multi-threading. And it is, processing is much faster. With my testbed, it used 4 threads. I didn’t have server-grade hardware with sufficient storage at hand so I threw together some surplus desktop hardware. Test environment:

  • Windows Server 2016 RTM patched to .82 (no changes to dedup components though)
  • HP desktop with i5 4rd generation
  • 32GB RAM
  • 3*6TB HGST UltraStar – Storage Spaces striped (for performance): 16.4TB total

I wanted to generate data that was semirandom. Not in crypto sense but somewhat random but still compressible to simulate realistic workload. Dummy File Generator works great for that. It’s much faster than true random and results in about 1:40 dedup ratio. Could be less but good enough. I used this great tool to create huge files in 1TB increments.

Aaand I get chunk store corruption with files over 1TB. It will process one point something TB, report an error in event log and abort on file. Get-DedupMetaData recommends to run scrubbing. If you do, it will report that some of your files have been corrupted.

Rinse and repeat. Well actually, wipe, test disks, reinstall, test. Still corruption. Well that sucks. I didn’t actually try to read back data.

I haven’t yet ruled out all variables but WS2012R2 works great on that very same system. But WS2016 general availability is still over a month away. It could get patched as even in an unsupported scenario – data corruption is still data corruption.

I’ll add more info when I have some.

Update 23.08.2016

Several runs of Memtest, a few hours of Prime95, checked disks again, reinstalled Windows. Still corruption.

After doublechecking I discovered that WS2012R2 ran fine on another but otherwise identical PC.

So I grabbed another similar system from shelf and started over, even only using inbox drivers this time. I also replaced Corsair MX100 boot SSD. It has awful reputation for stability and could be to blame here. Might be that during memory-intensive processing some RAM spilled to pagefile and got corrupted.

Still, another test run just failed at 2,1TiB of 3TiB. I don’t have any other ideas to check. I think that deduplication is currently simply broken for huge files.

Some event log bits.

Data Deduplication aborted a file range.
FileId: 0x40000000000B1
FilePath: D:\1-3
RangeOffset: 0x25000000000
RangeLength: 0x1000000000
ErrorCode: 0x80070017
ErrorMessage: Data error (cyclic redundancy check).
Details: Failed to notify optimization for range
Data Deduplication aborted a file.
FileId: 0x40000000000B1
FilePath: D:\1-3
FileSize: 0x30000000000
Flags: 0x0
TotalRanges: 48
SkippedRanges: 0
AbortedRanges: 0
CommittedRanges: 33
ErrorCode: 0x80070017
ErrorMessage: Data error (cyclic redundancy check).
Details: Failed to add range to session

Created at the start of the run several hours earlier. Not sure if relevant but there are several similar emergency files created.

Data Deduplication created emergency file GCReservedSpaceBitmap.tmp.
   Running the deduplication job.
   Volume name: D: (\\?\Volume{de37720c-fc77-4cdb-9a4d-1864a0e07953}\)

Scrubbing hasn’t yet started. I’ll report it’s results when it completes.

Update 26.08.2016

I forgot to export event logs before wiping the system. I installed 2012R2 to be sure and it has processed 15TB+ data so far and is currently processing 6TB test file with 7TB in queue – no problems so far. If all goes well, it’ll probably will have processed a 10TB file by Monday. Main bottleneck is dedup data processing performance, only about 200MB/s at 3,8GHz clock (i7-4770 this time). RAID can write new files at 600-650MB/s but it’s really one or another. Even when trying to make dedup job high performance (no throttling, high priority, realtime process, increase IO priority…), it even doesn’t attempt to compete with other IO, slowing down about 10 times. WS2016 does much better but we know the story with that…

I’m not a fan of new less detailed release notes for updates. But new production preview patch (.103?) has updated components for NTFS with vague notes about improved reliability. I’d like to know what the original issue was without updating blindly… But that’s new normality, I’ll try it out next week.

Update 28.08.2016

WS2012R2 had successfully deduped 9TB file and was midway through 10TB before I cancelled the process and installed WS2016 with .103. It’s running overnight, we’ll see the results in the morning.

Update 30.08.2016

Corrupted. I’ll put this on hold until further patches or more information.