Deduplication, offline files and Microsoft Office don’t mix

There’s a bug in the way that dedup, offline files (Client Side Cache – CSC) and Microsoft Office interact. My guess is that problem relies in CSC but lets get into details.

Scenario:

  • SMB share stored on deduplicated volume on WS2012R2 (2012R1 probably as well) server
  • Windows 7 or Windows 8.1 client (not tested on Windows 10)
  • Share or files on share are set as available offline
  • Client stores a 32kB+ Office file (doc, docx etc…) on share
  • File gets deduplicated and obtains reparse point (important) attribute
  • Changed attributes get synced to client
  • Client is working offline (disconnected or with always offline policy)
  • Client attempts to save changes to file in Microsoft Office

Boom, error! Gotcha! Why is this a problem? Let’s go over details.

  • CSC downloads actual files contents from server and stores them in flat files without metadata.
  • ACLs and attributes get stored in some separate database. I haven’t bothered to go deeper but flat files don’t have relevant ACLs nor attributes in actual backing file system. Maybe extended attributes or ADS…? I’ve never noticed anything similar to a “CSC.db”.
  • CSC presents files with relevant attributes to applications. Eg ACLs (mostly, not going into details) work and serverside attributes get presented to applications, including hidden read-only file system specific ones.
  • Microsoft Office is being smart and trying to enumerate reparse point data (probably for cases such as https://blogs.msdn.microsoft.com/oldnewthing/20051128-10/?p=33193). Remember we’re working offline.

Now things go wrong

  • Querying reparse information fails because…
  • CSC only masks reparse point attribute on stack without actual metadata.
  • Data on backing disk files is not actually a reparse point.
  • Client does not have dedup filter driver anyways.
  • Boom, query fail, no saving for you today.

Most other applications (in fact MSO is the only case I’ve found) just don’t care about reparse point and write to file just fine. For example Notepad doesn’t check attributes and just works. In my case, I was using always offline policy for folder redirection (online performance is awful on slow links) and it destroyed user productivity as users had to always save changes into new files.

It took me about a year to get through Microsoft support and get this issue confirmed. A hotfix was promised after April 2016 but so far it doesn’t seem to have been fixed.

Workaround is to exclude all and any file types from deduplication that you expect Microsoft Office users to modify and then rehydrate all those files server-side. If you have tons of these, your storage requirements will blow up, especially if you’ve come to depend on dedup. But end-users are happy and don’t constantly have to save edited data into new files.