first post for me!
So. We have a bunch of FAS3240 Filers and I was quite happy to find Cacti Templates which seemed to work right away. I was a little annoyed by the fact that my filer's APIs give microseconds so I added a CDEF which divided by 10^6.
However, the volume latency numbers were totally off
. It showed latencies of around 2-7 seconds
for read and average, but only 20 milliseconds for write. OK, this filer churns about 450 MB/s, but *anyway*, this is surely not normal. Also I'm currently trying to debug some performance problems and so I investigated a little further.
I logged into the Solaris clients and did measure the NFS latency with iostat -x. It showed latencies of around 4 - 5 milliseconds. I was so confused that I took it one step further and checked out the nfsv3 API and the nfs_read_latency counters, which are milliseconds by the way. They also gave readings of about 4 - 5 milliseconds.
So we took a whiteboard and made up some figures.
The original formula
does basically the following. You grab a counter at t0. Then the Cacti poller will wait for it's 300 seconds, and get the next value (at time t1=t0+300 seconds). The above formula is being applied. But, and now comes the point, you still have to divide by the 300 seconds polling interval!
So now, because I really like to have seconds, a logarithmic graph with SI-units, I divide by 300 * 10^6 to get the correct readings.
I'm using this formula now in all my time-based graphs:
(for dt the polling interval in seconds - might differ if you run scripts with data input methods, like I do most of the time)
And - abracadabra! - the values from volume-latency, nfsv3-latency and iostat from Solaris match.EDIT:
You don't need to divide by the polling interval for processor:processor_busy by the way - because these are timeticks.
And, for the fun, here the graph where I didn't divide by the polling interval. I almost had an heart attack:
What I didn't find out yet (will do that tomorrow) is if the read- and write-counters have differnt bases. Because, to be honest, I don't believe that 20 MICROseconds is a reasonable write-latency... More on that later.