There are several API functions that have changed slightly on the newest version of the NetApp appliances. There are also new requirements to use iterative API's and specifically define instances which was not the case in the past. I am using a Cluster On TAP 8.2.1 system.
I've fixed up the perl script to work with clusters. I'm also making some changes to the graphs. I don't know how much effort I'm going to invest in it though. When I'm done, I'll post the results.
Here is some documentation coming back directly from the perf-object-counter-list-info API on the filer:
'content' => 'Average latency in microseconds for the WAFL filesystem to process read request to the volume; not including request processing or network communication time'
'name' => 'properties' 'content' => 'average',
'name' => 'unit' 'content' => 'microsec',
I understand what gheppner is doing, but I had to give it some thought. It's all about how you choose to represent the data. Neither approach is technically incorrect, you just need to understand what you're looking at.
It's somewhat complicated to understand, but the value coming back from avg_latency, read_latency and write_latency is a COUNTER of the number of microseconds of latency that has occurred since some arbitrary point in the past (system reboot perhaps, or counter roll-over). This is quite typical of how most storage systems report latency. To see this in real-time run a command like this:
watch -n 1 ./netapp-ontapsdk-perf.pl FILER USER PASSWORD volume get write_latency VOLUME
You'll watch the number of microseconds increase slowly (or quickly) depending on how much latency is being generated. Now all Cacti does is read this COUNTER once every minute (or 5 minutes), factor in the step value of 60 (or 300) and compute a "microseconds of latency per second" value to put up on a graph.
What gheppner did was simply modify this to produce a "microseconds of latency per second per operation" value and graph it. This may be more indicative of what the filer reports through CLI commands ... but I wouldn't know ... I don't have access to the filer's CLI.
In your particular case what's really screwing with the graph is the other_latency. It's certainly larger than the read/write. I haven't investigated what other_latency really is. I'm working with a brand-new storage system that has no production traffic on it, so it's also difficult to compare to what I'm capturing now. Looking back into ancient history at our older 3040C data I would typically have other_latency: 300 write_latency: 170 read_latency: 50 , while the system was in production. These were functioning as storage back-ends for a large mail system. It's entirely possible that you indeed have an application / storage system with a large amount of latency. But overall, you're still going to see trends in latency - it will go up, and go down, and when it goes up you should take notice