SCOM management packs in Zabbix – a year later

I discussed this about a year ago but in the end I didn’t publish anything. I actually did get “Windows  Server Operating System” MP to be pretty much feature-complete (no to little OS metadata – health checks only) and it pretty much blows away any Zabbix built-in template and any other I’ve seen. There’s a few addition bits that I found useful. Works fine on Windows 2012+ and… more-less fine on 2008 and 2008R2. Some items are missing due to different performance monitors but I really haven’t bothered to edit it (physical disk and networking if I remember correctly). All items and triggers use macros so it’s easy to override checks.

The main issue remains 256 char item limit. I did make some progress in packing extra PowerShell in this small limit so previous posts may not be up to date, so templates still don’t require any changes to agent or any local scripts. Another issue is that I can’t reference items from other (linked) templates in triggers. And as you can’t add the same item in another template, it makes some templates REALLY annoying. 30 second command timeout remains an issue so you can’t actively defrag/chkdsk/unmap/trim or do very expensive checks. Command timeout with proxy seems to cause proxy to reissue commands every few minutes, causing performance issues as commands never complete and just repeat indefinitely. I did leave the checks in but disabled them. File system health is checked from just dirty flag and fragmentation information is checked from registry last run data. It seems to trigger false positives occasionally from VMware snapshots but works reasonably well. I did figure out how to change disk optimization from weekly to daily in PowerShell but it’s waaaaay too big to fit in item for all OS. I did consider building item command from multiple macros but this change would have little value. For reference (2012+ only):

$v=[environment]::OSVersion.Version;If($v.major -gt 6 -or ($v.major -eq 6 -and $v.minor -ge 2)){$s='ScheduledDefrag';[xml]$t=Get-ScheduledTask $s|export-scheduledtask;$t.Task.Settings.MaintenanceSettings.Period='P1D';register-scheduledtask -TaskN $s -TaskP '\Microsoft\Windows\Defrag' -X $t.outerxml -F}

I did some work on ADDS and File server MPs but it’s really time-consuming and they remain incomplete (they have helped to catch a few incidents though). I did mostly complete Exchange template but it’s mostly telemetry (as in original MP) and alerting mostly works by querying health monitor – but again, it has helped to diagnose issues and catch incidents early.

I’ll try to clean them up and release somehow… someday.

PS! I still think that Zabbix sucks but it’s one of the best among free stuff. 🙂

Workaround for NTFS deduplication error 0x8007000E Not enough storage is available to complete this operation

This can pop up when starting an optimization job, even when you have plenty of RAM, even if you give tons of memory to job. Error message is misleading, storage here means memory.

Workaround is to just increase page file. I came across this issue on a Server Core 2016 that had 24GB of RAM for a 16TB volume. Analysis job caused commit to grow to almost 90% (without releasing it in time) so optimization could not allocate any memory. I didn’t go in depth (RAMMap etc) though. After increasing page file from automatic ~2GB to 16GB, jobs work just fine.

Keep in mind that commit does not mean that memory or page file is actually used. It just means that application has been promised that this memory will be available when it will be actually used. Unused commit is taken from pagefile first so it’s basically free performance-wise, except for increased disk space use.

Online P2V of domain controllers

Don’t do it or do it in DSRM. Until for various reasons you just… can’t. Unacceptable downtime, Exchange/SBS, Windows 2003 (can’t stop AD services), etc. Doesn’t matter, you just have to do the P2V online.

It’s not supported (probably) or recommended but if you really need to then (skipping obvious steps):

  1. Stop replication some time before finalizing conversion
    repadmin /options %COMPUTERNAME% +DISABLE_OUTBOUND_REPL
    repadmin /options %COMPUTERNAME% +DISABLE_INBOUND_REPL
  2. Disconnect target VM network and boot to DSRM.
  3. Set “database restored from backup” flag in registry – just in case!
    https://technet.microsoft.com/nl-nl/library/dd363545(v=ws.10).aspx
  4. Boot normally
  5. Enable replication
    repadmin /options %COMPUTERNAME% -DISABLE_OUTBOUND_REPL
    repadmin /options %COMPUTERNAME% -DISABLE_INBOUND_REPL

     

Again, not supported nor recommended but it has worked for me.