Deduplication, offline files and Microsoft Office don’t mix

There’s a bug in the way that dedup, offline files (Client Side Cache – CSC) and Microsoft Office interact. My guess is that problem relies in CSC but lets get into details.

Scenario:

  • SMB share stored on deduplicated volume on WS2012R2 (2012R1 probably as well) server
  • Windows 7 or Windows 8.1 client (not tested on Windows 10)
  • Share or files on share are set as available offline
  • Client stores a 32kB+ Office file (doc, docx etc…) on share
  • File gets deduplicated and obtains reparse point (important) attribute
  • Changed attributes get synced to client
  • Client is working offline (disconnected or with always offline policy)
  • Client attempts to save changes to file in Microsoft Office

Boom, error! Gotcha! Why is this a problem? Let’s go over details.

  • CSC downloads actual files contents from server and stores them in flat files without metadata.
  • ACLs and attributes get stored in some separate database. I haven’t bothered to go deeper but flat files don’t have relevant ACLs nor attributes in actual backing file system. Maybe extended attributes or ADS…? I’ve never noticed anything similar to a “CSC.db”.
  • CSC presents files with relevant attributes to applications. Eg ACLs (mostly, not going into details) work and serverside attributes get presented to applications, including hidden read-only file system specific ones.
  • Microsoft Office is being smart and trying to enumerate reparse point data (probably for cases such as https://blogs.msdn.microsoft.com/oldnewthing/20051128-10/?p=33193). Remember we’re working offline.

Now things go wrong

  • Querying reparse information fails because…
  • CSC only masks reparse point attribute on stack without actual metadata.
  • Data on backing disk files is not actually a reparse point.
  • Client does not have dedup filter driver anyways.
  • Boom, query fail, no saving for you today.

Most other applications (in fact MSO is the only case I’ve found) just don’t care about reparse point and write to file just fine. For example Notepad doesn’t check attributes and just works. In my case, I was using always offline policy for folder redirection (online performance is awful on slow links) and it destroyed user productivity as users had to always save changes into new files.

It took me about a year to get through Microsoft support and get this issue confirmed. A hotfix was promised after April 2016 but so far it doesn’t seem to have been fixed.

Workaround is to exclude all and any file types from deduplication that you expect Microsoft Office users to modify and then rehydrate all those files server-side. If you have tons of these, your storage requirements will blow up, especially if you’ve come to depend on dedup. But end-users are happy and don’t constantly have to save edited data into new files.

Windows 2016 deduplication data corruption with huge files – fixed in KB3216755

Update 27.01.2017: Corruption issue has been fixed in KB3216755 and newer!

Pre-emptive remark! Currently NTFS dedup supports up to 1TB files, anything beyond that may or (in this case) may not work. Windows Server 2016 is not generally available so any bugs may be fixed by GA.

First, some background and info on 2012R2 dedup.

As known, WS2016 RTM bits are already out there (same version as Windows 10 1607). Deduplication is supposed to be much better this time. It wasn’t too bad in WS2012R2 but optimization was slow on big volumes or with lots of modified data.

File size limit is still 1TB though. Why would you ever want to go higher? Image-based backup such as Veeam. Big VMs result in multi-TB files. Storage may be cheap but it’s not that cheap. NTFS dedup will still allow you to reduce storage costs, even when using cheap solutions.

WS2012R2 could go much higher in practice if you followed some guidelines. Main limitation is not dedup engine itself but NTFS that has limitations on how fragmented big files can become. Fragmentation causes number of file extents to increase. If you hit File Record limitations, file can no longer grow. Deduped size on disk is often reported as zero bytes but behind the scenes, reparse point seems to hold all the metadata to rebuild the file. It is also likely that modified data gets stored in reparse point until optimization. I’m not 100% sure about that but you can make some reasonable guesses from this article The Four Stages of NTFS File Growth, Part 2 and real life behavior.

Anyways, guidelines

  • Format NTFS with Large File Records. This is IMHO critical for any NTFS volume containing big (dozens of GiB) files, especially if using NTFS compression (yeah, it actually is a good idea… in some rare scenarios) and/or 4KiB clusters. In this case it allows for file record to be bigger before hitting fragmentation limits.
  • Format NTFS with 64KiB cluster size. Might increase performance but mainly helps to reduce amount of metadata per file, again allowing to keep NTFS limitations at bay.
  • Write files out in one run and never modify them later. Optimizing huge files is much slower anyways, reprocessing them is even slower and a good way to fragment your data.
  • Defragment file system. This will reduce amount of extents, allowing bigger files.
  • Apply registry flag EnablePriorityOptimization. This causes Dedup engine to aggressively defrag fragmented files.
  • Keep reasonable free space (20%), with thin provision if possible. Again, to keep fragmentation under control as NTFS tends to heavily fragment when free disk space runs low. Thin provisioning allows to not waste actual disk space.

Some optional stuff

  • Do free space consolidation in addition to usual defrag. This will reduce fragmentation of future data.
  • Use contig.exe from SysInternals. This is basically per-file defrag. This may help if you do hit limitations. Just make sure you have enough free space and it is more-less continuous.

With care, you could easily get 4TB files and even more. I’ve tested with 6TB files but processing gets sloooooooow (2 days at least) due to single-threaded nature of dedup. If you do want to run it this way, select a CPU with high clock rate, additional cores won’t help you (unless you have a lot of volumes). If you look around forums (mainly Veeam), you’ll see people using 4TB files in production. Again, YMMV, it’s unsupported but it works with some care.

There are also some forum threads with information from product team. Supposedly WS2016 will skip processing data over 4TB. And WS2012R2 dedup will work fine with 1TB+ files if you are careful. But unofficial information is unofficial.

So WS2016 is supposed to be multi-threading. And it is, processing is much faster. With my testbed, it used 4 threads. I didn’t have server-grade hardware with sufficient storage at hand so I threw together some surplus desktop hardware. Test environment:

  • Windows Server 2016 RTM patched to .82 (no changes to dedup components though)
  • HP desktop with i5 4rd generation
  • 32GB RAM
  • 3*6TB HGST UltraStar – Storage Spaces striped (for performance): 16.4TB total

I wanted to generate data that was semirandom. Not in crypto sense but somewhat random but still compressible to simulate realistic workload. Dummy File Generator works great for that. It’s much faster than true random and results in about 1:40 dedup ratio. Could be less but good enough. I used this great tool to create huge files in 1TB increments.

Aaand I get chunk store corruption with files over 1TB. It will process one point something TB, report an error in event log and abort on file. Get-DedupMetaData recommends to run scrubbing. If you do, it will report that some of your files have been corrupted.

Rinse and repeat. Well actually, wipe, test disks, reinstall, test. Still corruption. Well that sucks. I didn’t actually try to read back data.

I haven’t yet ruled out all variables but WS2012R2 works great on that very same system. But WS2016 general availability is still over a month away. It could get patched as even in an unsupported scenario – data corruption is still data corruption.

I’ll add more info when I have some.

Update 23.08.2016

Several runs of Memtest, a few hours of Prime95, checked disks again, reinstalled Windows. Still corruption.

After doublechecking I discovered that WS2012R2 ran fine on another but otherwise identical PC.

So I grabbed another similar system from shelf and started over, even only using inbox drivers this time. I also replaced Corsair MX100 boot SSD. It has awful reputation for stability and could be to blame here. Might be that during memory-intensive processing some RAM spilled to pagefile and got corrupted.

Still, another test run just failed at 2,1TiB of 3TiB. I don’t have any other ideas to check. I think that deduplication is currently simply broken for huge files.

Some event log bits.

Data Deduplication aborted a file range.
FileId: 0x40000000000B1
FilePath: D:\1-3
RangeOffset: 0x25000000000
RangeLength: 0x1000000000
ErrorCode: 0x80070017
ErrorMessage: Data error (cyclic redundancy check).
Details: Failed to notify optimization for range
Data Deduplication aborted a file.
FileId: 0x40000000000B1
FilePath: D:\1-3
FileSize: 0x30000000000
Flags: 0x0
TotalRanges: 48
SkippedRanges: 0
AbortedRanges: 0
CommittedRanges: 33
ErrorCode: 0x80070017
ErrorMessage: Data error (cyclic redundancy check).
Details: Failed to add range to session

Created at the start of the run several hours earlier. Not sure if relevant but there are several similar emergency files created.

Data Deduplication created emergency file GCReservedSpaceBitmap.tmp.
Operation:
   Running the deduplication job.
Context:
   Volume name: D: (\\?\Volume{de37720c-fc77-4cdb-9a4d-1864a0e07953}\)

Scrubbing hasn’t yet started. I’ll report it’s results when it completes.

Update 26.08.2016

I forgot to export event logs before wiping the system. I installed 2012R2 to be sure and it has processed 15TB+ data so far and is currently processing 6TB test file with 7TB in queue – no problems so far. If all goes well, it’ll probably will have processed a 10TB file by Monday. Main bottleneck is dedup data processing performance, only about 200MB/s at 3,8GHz clock (i7-4770 this time). RAID can write new files at 600-650MB/s but it’s really one or another. Even when trying to make dedup job high performance (no throttling, high priority, realtime process, increase IO priority…), it even doesn’t attempt to compete with other IO, slowing down about 10 times. WS2016 does much better but we know the story with that…

I’m not a fan of new less detailed release notes for updates. But new production preview patch (.103?) has updated components for NTFS with vague notes about improved reliability. I’d like to know what the original issue was without updating blindly… But that’s new normality, I’ll try it out next week.

Update 28.08.2016

WS2012R2 had successfully deduped 9TB file and was midway through 10TB before I cancelled the process and installed WS2016 with .103. It’s running overnight, we’ll see the results in the morning.

Update 30.08.2016

Corrupted. I’ll put this on hold until further patches or more information.