Powershell arrays are passed by reference, unlike basic variables‏

PowerShell is great in many ways yet very unintuitive in others.

Consider following example:

$a = 0
$b = $a
$b = 1
$a #0
$b #1

All seems good and logical? Now introduce arrays:

$a=@(1)
$a #1
$b=$a
$b[0]=2
$a #2!
$b #2

What? How did $a change? Surely this is an artifact of direct modification or something. Let’s try passing array to a function.

$a=@(1)
function b {param($c);$d=$c;$d[0]=2;$d}
$a #1
b $a #2
$a #2!

Now that’s annoying if you’re passing the same array around in a script. No level of scoping or any other tinkering will fix that. A bit of MSDN and StackOverflow reveals that arrays are always, and I mean always, passed by reference, something inherited from .Net. There are a few not-so-pretty workarounds.

use .Clone() method. Caveat is that it only works one level. So it you use multidimensional arrays, you’re out of luck. Example:

$a=@(1,@(1))
function b {param($c);$d=$c.Clone();$d[0]=2;$d[1][0]=2;$d}
$a #1,1
b $a #2,2
$a #1,2!

As you can see, first level of array works fine but second does not.

Serialize-deserialize array. That’s a really ugly workaround but it’s guaranteed to work. Take a look here. I haven’t tested it because cloning worked for my needs but I have a feeling that it is much slower. That may or not be an issue depending on your requirements. Might be a good idea to wrap it in a function for easy use.

Wishlist: runtime flag or global variable to pass arrays by value.

Deleting System Center Configuration Manager application will not delete supersedence relationship‏

Imagine a scenario where you have following applications is SCCM:

  • Application X v1.0
  • Application X v1.1 – supersedes v1.0
  • Application X v1.2 – supersedes v1.1, deployed to clients

At one point you might want to delete v1.0 from SCCM. At this point you must keep in mind that supersedence data will not be updated for v1.1. v1.1 will still contain broken supersedence information that will break deployment for even v1.2 as client can no longer build supersedence chain (v1.1 references nonexistent application). You must manually remove supersedence information from v1.1.

Observed in 1602 and 1606. I’ve thought about scripting this validation-remediation but SCCM PowerShell is quite cryptic past very basic operations (WQL or sparsely documented .Net classes for most things). I guess a deep dive into SCCM SDK is in order someday.

Funny thing is that this seems to be only relation that is not enforced. And no, I haven’t contacted MS support about this as it’s not that important for me to burn a support ticket.

Sertifitseerimiskeskus OCSP is not RFC compliant

This issue appeared a few months ago when SK introduced OCSP for KLASS-SK 2010 CA. Previously there was no OCSP at all, only CRL.

The issue is that OCSP responds “revoked” to expired certificates. You might think one should never use an expired certificate. True, but world is not always so black and white. You might not really care for retired-archived systems or internal services. One might simply forget to renew certificate or admin is on vacation etc. People are imperfect and processes do fail. Previously you’d get a warning that certificate is expired but it’s easy to click through that, no worries. Now you get hardblocked.

Current revision is RFC 6960 that basically says that you may reply “revoked” only if certificate actually is revoked or if it has never been issued. For any other case, correct response is “good” or “unknown”. Obsoleted RFC 2560 makes basically the same statement.

SK support is aware of the issue but their statement was that this will not be fixed. I guess that this is a business decision (you must order new one – $$$ – or use self-signed/internal CA) as I know of no other major CA that behaves like that. I’m not a security guy, but I don’t think it’s really an issue if certificate is used a few days past expiration date in case of a human mistake.

Windows 2016 deduplication data corruption with huge files – fixed in KB3216755

Update 27.01.2017: Corruption issue has been fixed in KB3216755 and newer!

Pre-emptive remark! Currently NTFS dedup supports up to 1TB files, anything beyond that may or (in this case) may not work. Windows Server 2016 is not generally available so any bugs may be fixed by GA.

First, some background and info on 2012R2 dedup.

As known, WS2016 RTM bits are already out there (same version as Windows 10 1607). Deduplication is supposed to be much better this time. It wasn’t too bad in WS2012R2 but optimization was slow on big volumes or with lots of modified data.

File size limit is still 1TB though. Why would you ever want to go higher? Image-based backup such as Veeam. Big VMs result in multi-TB files. Storage may be cheap but it’s not that cheap. NTFS dedup will still allow you to reduce storage costs, even when using cheap solutions.

WS2012R2 could go much higher in practice if you followed some guidelines. Main limitation is not dedup engine itself but NTFS that has limitations on how fragmented big files can become. Fragmentation causes number of file extents to increase. If you hit File Record limitations, file can no longer grow. Deduped size on disk is often reported as zero bytes but behind the scenes, reparse point seems to hold all the metadata to rebuild the file. It is also likely that modified data gets stored in reparse point until optimization. I’m not 100% sure about that but you can make some reasonable guesses from this article The Four Stages of NTFS File Growth, Part 2 and real life behavior.

Anyways, guidelines

  • Format NTFS with Large File Records. This is IMHO critical for any NTFS volume containing big (dozens of GiB) files, especially if using NTFS compression (yeah, it actually is a good idea… in some rare scenarios) and/or 4KiB clusters. In this case it allows for file record to be bigger before hitting fragmentation limits.
  • Format NTFS with 64KiB cluster size. Might increase performance but mainly helps to reduce amount of metadata per file, again allowing to keep NTFS limitations at bay.
  • Write files out in one run and never modify them later. Optimizing huge files is much slower anyways, reprocessing them is even slower and a good way to fragment your data.
  • Defragment file system. This will reduce amount of extents, allowing bigger files.
  • Apply registry flag EnablePriorityOptimization. This causes Dedup engine to aggressively defrag fragmented files.
  • Keep reasonable free space (20%), with thin provision if possible. Again, to keep fragmentation under control as NTFS tends to heavily fragment when free disk space runs low. Thin provisioning allows to not waste actual disk space.

Some optional stuff

  • Do free space consolidation in addition to usual defrag. This will reduce fragmentation of future data.
  • Use contig.exe from SysInternals. This is basically per-file defrag. This may help if you do hit limitations. Just make sure you have enough free space and it is more-less continuous.

With care, you could easily get 4TB files and even more. I’ve tested with 6TB files but processing gets sloooooooow (2 days at least) due to single-threaded nature of dedup. If you do want to run it this way, select a CPU with high clock rate, additional cores won’t help you (unless you have a lot of volumes). If you look around forums (mainly Veeam), you’ll see people using 4TB files in production. Again, YMMV, it’s unsupported but it works with some care.

There are also some forum threads with information from product team. Supposedly WS2016 will skip processing data over 4TB. And WS2012R2 dedup will work fine with 1TB+ files if you are careful. But unofficial information is unofficial.

So WS2016 is supposed to be multi-threading. And it is, processing is much faster. With my testbed, it used 4 threads. I didn’t have server-grade hardware with sufficient storage at hand so I threw together some surplus desktop hardware. Test environment:

  • Windows Server 2016 RTM patched to .82 (no changes to dedup components though)
  • HP desktop with i5 4rd generation
  • 32GB RAM
  • 3*6TB HGST UltraStar – Storage Spaces striped (for performance): 16.4TB total

I wanted to generate data that was semirandom. Not in crypto sense but somewhat random but still compressible to simulate realistic workload. Dummy File Generator works great for that. It’s much faster than true random and results in about 1:40 dedup ratio. Could be less but good enough. I used this great tool to create huge files in 1TB increments.

Aaand I get chunk store corruption with files over 1TB. It will process one point something TB, report an error in event log and abort on file. Get-DedupMetaData recommends to run scrubbing. If you do, it will report that some of your files have been corrupted.

Rinse and repeat. Well actually, wipe, test disks, reinstall, test. Still corruption. Well that sucks. I didn’t actually try to read back data.

I haven’t yet ruled out all variables but WS2012R2 works great on that very same system. But WS2016 general availability is still over a month away. It could get patched as even in an unsupported scenario – data corruption is still data corruption.

I’ll add more info when I have some.

Update 23.08.2016

Several runs of Memtest, a few hours of Prime95, checked disks again, reinstalled Windows. Still corruption.

After doublechecking I discovered that WS2012R2 ran fine on another but otherwise identical PC.

So I grabbed another similar system from shelf and started over, even only using inbox drivers this time. I also replaced Corsair MX100 boot SSD. It has awful reputation for stability and could be to blame here. Might be that during memory-intensive processing some RAM spilled to pagefile and got corrupted.

Still, another test run just failed at 2,1TiB of 3TiB. I don’t have any other ideas to check. I think that deduplication is currently simply broken for huge files.

Some event log bits.

Data Deduplication aborted a file range.
FileId: 0x40000000000B1
FilePath: D:\1-3
RangeOffset: 0x25000000000
RangeLength: 0x1000000000
ErrorCode: 0x80070017
ErrorMessage: Data error (cyclic redundancy check).
Details: Failed to notify optimization for range
Data Deduplication aborted a file.
FileId: 0x40000000000B1
FilePath: D:\1-3
FileSize: 0x30000000000
Flags: 0x0
TotalRanges: 48
SkippedRanges: 0
AbortedRanges: 0
CommittedRanges: 33
ErrorCode: 0x80070017
ErrorMessage: Data error (cyclic redundancy check).
Details: Failed to add range to session

Created at the start of the run several hours earlier. Not sure if relevant but there are several similar emergency files created.

Data Deduplication created emergency file GCReservedSpaceBitmap.tmp.
Operation:
   Running the deduplication job.
Context:
   Volume name: D: (\\?\Volume{de37720c-fc77-4cdb-9a4d-1864a0e07953}\)

Scrubbing hasn’t yet started. I’ll report it’s results when it completes.

Update 26.08.2016

I forgot to export event logs before wiping the system. I installed 2012R2 to be sure and it has processed 15TB+ data so far and is currently processing 6TB test file with 7TB in queue – no problems so far. If all goes well, it’ll probably will have processed a 10TB file by Monday. Main bottleneck is dedup data processing performance, only about 200MB/s at 3,8GHz clock (i7-4770 this time). RAID can write new files at 600-650MB/s but it’s really one or another. Even when trying to make dedup job high performance (no throttling, high priority, realtime process, increase IO priority…), it even doesn’t attempt to compete with other IO, slowing down about 10 times. WS2016 does much better but we know the story with that…

I’m not a fan of new less detailed release notes for updates. But new production preview patch (.103?) has updated components for NTFS with vague notes about improved reliability. I’d like to know what the original issue was without updating blindly… But that’s new normality, I’ll try it out next week.

Update 28.08.2016

WS2012R2 had successfully deduped 9TB file and was midway through 10TB before I cancelled the process and installed WS2016 with .103. It’s running overnight, we’ll see the results in the morning.

Update 30.08.2016

Corrupted. I’ll put this on hold until further patches or more information.

Why did I start blogging again?

I had a blog for many years but at the time, it had little focus so I decided to discard it as having little value to anybody.

In years since previous blog’s closure, every now and then I’ve had subjects that might need sharing with world. Perhaps solution to some problem or even just documenting some issue. So today I installed WordPress and I actually have a few posts in mind. Focus is around my work (IT) so let’s hope that I’ll have time to keep writing.

I try to write only on subjects that have had very little coverage (edge cases, implementation details, little-known problems etc) or are very hard to find on Google. No generic news or RTFM/UTFG tutorials.

Scripts are mostly unedited (some irrelevant details, logging functions etc are usually removed). A lazy sysadmin is a good sysadmin so I usually don’t polish my scripts beyond “it works fine and is reasonably readable”. If you need something better, DIY or improve my examples. It’d by nice if you shared improvements though.