Discovering multi-instance performance counters in Zabbix

I’m not a fan of Zabbix but you can’t always select your tools. I’m no expert on Zabbix so feel free to improve my solution.

The original problem was that most Zabbix templates available online for Windows are plain rubbish. Pretty much everything monitored is hardcoded (N volumes to check for free space, N SQL Server instances to check etc). Needless to say, this is ugly and doesn’t work well with more complex scenarios (think mount points or volumes without disk letter…). Agent built-in discovery is also quite limited.

My first instinct was to use Performance Counters but agent doesn’t know how to discover counter instances, once again requiring hardcoding. Someone actually patched agent to allow that but it has never been included in official agent.

Low Level Discovery is your way out but it’s implied to use local scripts. I used it with local scripts for a while but keeping them in sync and in-place was quite annoying. Another option is to use UserParameter in agent configuration. There are less limitations but this requires custom configuration on client and I’d like to keep agent basically stateless. I did use this implementation as inspiration though.

So one day I tried to squeeze it in 255 characters allowed for a run command. And i got to work.

Notes:

  • It’s trimmed every way possible to reduce characters as best as I could.
  • 255 characters is actually very little and you need to be really conservative…
  • …because you need to escape special characters 3 times. First escape strings in PowerShell. Then escape special characters to execute PowerShell commands directly in CMD. And finally escape some characters for Zabbix run command.
  • Double quotes are the main problem. I think that this is the best solution as I can’t use single quotes for JSON values.
  • If counter doesn’t exist or there are no instances, returns NULL.
  • You should be reasonably proficient in PowerShell and Zabbix to use that
  • Should work with reasonably modern Zabbix server and agents (2.2+)
  • I only used it on Server 2012 R2 but it should work also on 2008 R2 (not 2008) and 2012. Let me know how it works for you.

Update 2.09.2016
I’ve update the script to shave off a few more characters. I’ll update when I have some time.

So let’s figure this out. The original PowerShell script:

'{"data":['+(((Get-Counter -L 'PhysicalDisk'2>$null).PathsWithInstances|%{If($_){$_.Split('\')[1].Trim(')').Split('(')[1]}}|?{$_ -ne '_Total'}|Select -U|%{"{`"{#PCI}`":`"$_`"}"}) -join ',')+']}'

Phew, that’s hard to read even for myself. But remember, characters matter. I’ll explain it in parts.

'{"data":['

That’s just JSON header for LLD. I found it easier and to use less characters to hardcode some data rather than format data for JSON CmdLets.

(Get-Counter -L 'PhysicalDisk'2>$null).PathsWithInstances

As you might think, this retrieves instances of PhysicalDisk. You need it keep track on IO queues for examples. Replace it with counter you need. This actually retrieves all instances for all counters but we’ll clear this up later.
Sending errors to null allows to discover counters that might not exist on all servers (think IIS or SQL Server) – otherwise you’d get error (Zabbix reads back both StdOut and StdErr) but now it just returns NULL (eg nothing was discovered).
You can use * wildcard. For SQL Server, this is a must.

%{If($_){$_.Split('\')[1].Trim(')').Split('(')[1]}}

First I check if there was anything in pipeline. Without this, you’d get a pipeline error if there was no counter or no instances. Then I cut out the name on the instance.

Actually you can leave out the cutting part. In multi-instance SQL Server servers (when you used wildcard for counter name) you actually have to keep full name (both counter and counter instance) as counter name contains SQL Server instance name. For example:

%{If($_){$_.Split('\')[1]}}

I usually prefer to keep only instance names but it’s optional. Let’s go on…

?{$_ -ne '_Total'}

This is optional and can be omitted. Most counters have “_Total” aggregated instance that may or may not useful based on the instance. For example with PhysicalDisk, it’s more or less useless as you’d need per-instance counters for anything useful. On the other hand, Processor Information can be used to get both total and per-CPU/core/NUMA-node metrics.

Select -U

Remember that we’re actually working with all counters for all instances? This cleans them up, keeping single entry for instance.

%{"{`"{#PCI}`":`"$_`"}"}

Builds JSON entry for each discovered instance. {#PCI} is macro name for prototypes. PCI is arbitrary name – Performance Counter Instances. You can change that or trim to just one character – {#I}.

-join ','

Concentrates all instance JSON entries into one string.

']}'

JSON footer, nothing fancy, hardcoded.

Now the escaping. First PowerShell to CMD:

  • ” –> “””
  • | –> ^|
  • > –> ^>
  • prefix with “powershell -c”

Result that should run without errors in CMD and return instances in JSON.

powershell -c '{"""data""":['+(((Get-Counter -L 'PhysicalDisk'2^>$null).PathsWithInstances^|%{If($_){$_.Split('\')[1].Trim(')').Split('(')[1]}}^|?{$_ -ne '_Total'}^|Select -U^|%{"""{`"""{#I}`""":`"""$_`"""}"""}) -join ',')+']}'

Escaping for Zabbix

  • ” –> \”
  • Add system.run[” to start
  • Add “] to end
system.run["powershell -c '{\"\"\"data\"\"\":['+(((Get-Counter -L 'PhysicalDisk'2^>$null).PathsWithInstances^|%{If($_){$_.Split('\')[1].Trim(')').Split('(')[1]}}^|?{$_ -ne '_Total'}^|Select -U^|%{\"\"\"{`\"\"\"{#PCI}`\"\"\":`\"\"\"$_`\"\"\"}\"\"\"}) -join ',')+']}'"]

But oh no, it’s now 268 characters! You need to cut something out. Luckily you now have some examples for that. Here’s some more Zabbix formatted examples:

system.run["powershell -c '{\"\"\"data\"\"\":['+(((Get-Counter -L 'Processor Information'2^>$null).PathsWithInstances^|%{If($_){$_.Split('\')[1].Trim(')').Split('(')[1]}}^|Select -U^|%{\"\"\"{`\"\"\"{#I}`\"\"\":`\"\"\"$_`\"\"\"}\"\"\"}) -join ',')+']}'"]
system.run["powershell -c '{\"\"\"data\"\"\":['+(((Get-Counter -L 'MSSQL*Databases'2^>$null).PathsWithInstances^|%{If($_){$_.Split('\')[1]}}^|Select -U^|%{\"\"\"{`\"\"\"{#I}`\"\"\":`\"\"\"$_`\"\"\"}\"\"\"}) -join ',')+']}'"]

Now for item prototypes, if you cut instance down to counter instance name.

  • Name: IO Read Latency {#PCI}
  • Key: perf_counter[“\PhysicalDisk({#PCI})\Avg. Disk sec/Read”,60]

If you didn’t trim name and kept counter name

  • Name: IO Read Latency {#PCI}
  • Key: perf_counter[“\{#PCI}\Avg. Disk sec/Read”,60]

Keep in mind that name will now be something like “IO Read Latency PhysicalDisk\0 C:”

Again, if you have any improvements, especially to cut character count – let me know.

Leave a Reply

Your email address will not be published. Required fields are marked *