Cacti (home)ForumsDocumentation
Cacti: offical forums and support
It is currently Sun Aug 20, 2017 10:14 am

All times are UTC - 5 hours




Post new topic Reply to topic  [ 67 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next
Author Message
 Post subject: No Latency Data - LUN or Volume
PostPosted: Fri Feb 13, 2009 3:59 pm 
Offline

Joined: Fri Feb 13, 2009 3:46 pm
Posts: 1
Hi,

I'm trying to get latency data out of volume or LUN, but there is no data. Other read/write data is OK.

cacti-spine-0.8.7a-1.el5.rf
cacti-0.8.7c-1.el5.rf
rrdtool-1.2.29-1.el5.rf
NetApp Release 7.2.5.1

Anyone has any idea?

Thanks,

Dang

filename = "storage1_read_ops_14272.rrd"
rrd_version = "0003"
step = 300
last_update = 1234558204
ds[read_ops].type = "COUNTER"
ds[read_ops].minimal_heartbeat = 600
ds[read_ops].min = 0.0000000000e+00
ds[read_ops].max = NaN
ds[read_ops].last_ds = "210982540"
ds[read_ops].value = 8.3401993355e+01
ds[read_ops].unknown_sec = 0
ds[write_ops].type = "COUNTER"
ds[write_ops].minimal_heartbeat = 600
ds[write_ops].min = 0.0000000000e+00
ds[write_ops].max = NaN
ds[write_ops].last_ds = "1502227036"
ds[write_ops].value = 6.9506976744e+02
ds[write_ops].unknown_sec = 0
ds[total_ops].type = "COUNTER"
ds[total_ops].minimal_heartbeat = 600
ds[total_ops].min = 0.0000000000e+00
ds[total_ops].max = NaN
ds[total_ops].last_ds = "1797812046"
ds[total_ops].value = 8.7536212625e+02
ds[total_ops].unknown_sec = 0
ds[avg_latency].type = "COUNTER"
ds[avg_latency].minimal_heartbeat = 600
ds[avg_latency].min = 0.0000000000e+00
ds[avg_latency].max = NaN
ds[avg_latency].last_ds = "U"
ds[avg_latency].value = NaN

ds[avg_latency].unknown_sec = 4
ds[read_latency].type = "COUNTER"
ds[read_latency].minimal_heartbeat = 600
ds[read_latency].min = 0.0000000000e+00
ds[read_latency].max = NaN
ds[read_latency].last_ds = "U"
ds[read_latency].value = NaN

ds[read_latency].unknown_sec = 4
ds[write_latency].type = "COUNTER"
ds[write_latency].minimal_heartbeat = 600
ds[write_latency].min = 0.0000000000e+00
ds[write_latency].max = NaN
ds[write_latency].last_ds = "U"
ds[write_latency].value = NaN

ds[write_latency].unknown_sec = 4
ds[other_latency].type = "COUNTER"
ds[other_latency].minimal_heartbeat = 600
ds[other_latency].min = 0.0000000000e+00
ds[other_latency].max = NaN
ds[other_latency].last_ds = "U"
ds[other_latency].value = NaN

ds[other_latency].unknown_sec = 4


Top
 Profile  
 
 Post subject:
PostPosted: Tue Mar 10, 2009 6:40 pm 
Offline

Joined: Tue Mar 10, 2009 6:31 pm
Posts: 3
The latency is pulled via the API, not snmp, so check that you configured the script with a user that has api-* and http-login permissions on the netapp.

Or check out LogicMonitor, if you dont want to spend the time rolling your own cacti graphing and alerting for automated NetApp monitoring (and load balancers, databases, etc).


Top
 Profile  
 
 Post subject: no graphs showing up
PostPosted: Fri Mar 13, 2009 4:19 pm 
Offline

Joined: Fri Mar 13, 2009 4:16 pm
Posts: 2
i can run the verbose query but still there is no graphs.


if i run the command by itself it runs no problems. just wondering if i am missing something


nothing shows up in logs as error and i am not gettting http errors.


Top
 Profile  
 
 Post subject: Update
PostPosted: Fri Mar 13, 2009 4:38 pm 
Offline

Joined: Fri Mar 13, 2009 4:16 pm
Posts: 2
Reloaded the poller cache and now i get a NAN total on my graphs.

So the netapp script isn't running? but if i look in the logs it shows that it ran.


Top
 Profile  
 
 Post subject: Re: SNMP versions
PostPosted: Mon Mar 16, 2009 10:16 am 
Offline

Joined: Fri Apr 11, 2008 9:19 am
Posts: 16
wolf31o2 wrote:
adamshand wrote:
wolf31o2 wrote:


This looks great, thanks for posting it. Any chance of a quick readme on what all the bits are for?

Cheers,
Adam.


It's quite simple. Copy the things under scripts to <path_cacti>/scripts, and copy the things under script_server and snmp_queries to their directories under <path_cacti>/resource. After that, you import the templates, which I need to update with my latest changes. In fact, I need to upload some newer scripts and such, too.

I'm planning on supporting everything that I can via several methods.

- SNMPv1 for ONTAP versions prior to 7.3
- SNMPv2/v3 using 64-bit counters for 7.3 and above
- ONTAP Manage API for people who prefer it
- SMI-S Agent scripts for SMI-S software

Of course, I'm open to any help anyone wants to give, and everything I've written is released under the GPLv2. I am adding an installer script to it, and I could use some help with documentation, too. I'd like for the installer to detect the available methods and do some initial setup based on that, so it should work out of the box for everybody, and all they should need to know is the IP addresses of their Filers and the location of their Cacti installation.



Let us know what we can do to help on the project.

Roger L

Twitter:rogerlund
Blog:http://rogerlunditblog.blogspot.com


Top
 Profile  
 
 Post subject:
PostPosted: Fri Mar 27, 2009 1:03 pm 
Offline

Joined: Thu Dec 04, 2008 5:10 pm
Posts: 20
I'm finding that all the luns stats give me accurate data (as verified on the filer itself), with the exception of average latency. These numbers do not look accurate at all.

For instance, diong a "lun stats -o" for a given lun shows me average latencies around 7 or 8 ms. But cacti is showing me data in the 100 - 200 (usec? ms?) area.

I'm also wondering if the latency is really being returned in microseconds. If you use netapp-ontapsdk-perf-pl and do a "lun counter-list" you get this for latency:

Counter Name = avg_latency Base Counter = total_ops Privilege_level = basic Unit = millisec

So, i guess two questions here. 1) has anyone else verified the data you get with these templates is accurate and 2) is it usecs or microseconds?

It seems to me that some sort of CDEF might be required to adjust the data, but i can't figure out what.


Top
 Profile  
 
 Post subject:
PostPosted: Fri Mar 27, 2009 1:21 pm 
Offline

Joined: Fri Apr 11, 2008 9:19 am
Posts: 16
I am having trouble getting the API working with my FAS3140 V7.2.6.1

Anyone know if you need a certain version of data ontap for this to work?


Top
 Profile  
 
 Post subject:
PostPosted: Mon Mar 30, 2009 1:31 pm 
Offline

Joined: Thu Dec 04, 2008 5:10 pm
Posts: 20
gheppner wrote:
I'm finding that all the luns stats give me accurate data (as verified on the filer itself), with the exception of average latency. These numbers do not look accurate at all.

For instance, diong a "lun stats -o" for a given lun shows me average latencies around 7 or 8 ms. But cacti is showing me data in the 100 - 200 (usec? ms?) area.

I'm also wondering if the latency is really being returned in microseconds. If you use netapp-ontapsdk-perf-pl and do a "lun counter-list" you get this for latency:

Counter Name = avg_latency Base Counter = total_ops Privilege_level = basic Unit = millisec

So, i guess two questions here. 1) has anyone else verified the data you get with these templates is accurate and 2) is it usecs or microseconds?

It seems to me that some sort of CDEF might be required to adjust the data, but i can't figure out what.


... Ok, after some additional investigation I've concluded the following:

1) the units returned by the API are in milliseconds, not microseconds.
2) the value returned by a call to avg_latency is not representative of the average latency per operation, but the avg latency of the total ops in a given polling period.

I added total_ops as a data source to the lun latency graph template, and then used a CDEF to divide the latency by the total ops. I now get values in the 3 - 8 ms range that are consistent with what the filer shows with lun stats -o -i 5 <lun name>.

I'm curiuos if anyone else using these templates has noticed what I've noticed, or if I'm way out in left field here.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Mar 30, 2009 5:11 pm 
Offline

Joined: Mon Mar 30, 2009 4:58 pm
Posts: 3
gheppner wrote:
... Ok, after some additional investigation I've concluded the following:

1) the units returned by the API are in milliseconds, not microseconds.
2) the value returned by a call to avg_latency is not representative of the average latency per operation, but the avg latency of the total ops in a given polling period.

I added total_ops as a data source to the lun latency graph template, and then used a CDEF to divide the latency by the total ops. I now get values in the 3 - 8 ms range that are consistent with what the filer shows with lun stats -o -i 5 <lun name>.

I'm curiuos if anyone else using these templates has noticed what I've noticed, or if I'm way out in left field here.


Hi gheppner,

I've been attacking the same problem with regards to volume latency numbers. They're just way out of range (like PetaMicroseconds) :o. From reading up on the ONTAPI docs it appears that you are on the right track but they mention taking 2 samples at time T1 and T2 and then calculating latency as:

(latency_T2 - latency_T1) / (total_ops_T2 - total_ops_T1)

I took the netapp-ontapsdk-perf.pl script and hacked up a version to do 2 samples of volume avg_latency 10 seconds apart using the method above and the number very closely matches the CLI "stats show" output (volume latency is in microseconds).


Top
 Profile  
 
 Post subject:
PostPosted: Tue Mar 31, 2009 1:38 am 
Offline

Joined: Tue Mar 31, 2009 1:01 am
Posts: 3
jlindberg wrote:
gheppner wrote:
... Ok, after some additional investigation I've concluded the following:

1) the units returned by the API are in milliseconds, not microseconds.
2) the value returned by a call to avg_latency is not representative of the average latency per operation, but the avg latency of the total ops in a given polling period.

I added total_ops as a data source to the lun latency graph template, and then used a CDEF to divide the latency by the total ops. I now get values in the 3 - 8 ms range that are consistent with what the filer shows with lun stats -o -i 5 <lun name>.

I'm curiuos if anyone else using these templates has noticed what I've noticed, or if I'm way out in left field here.


Hi gheppner,

I've been attacking the same problem with regards to volume latency numbers. They're just way out of range (like PetaMicroseconds) :o. From reading up on the ONTAPI docs it appears that you are on the right track but they mention taking 2 samples at time T1 and T2 and then calculating latency as:

(latency_T2 - latency_T1) / (total_ops_T2 - total_ops_T1)

I took the netapp-ontapsdk-perf.pl script and hacked up a version to do 2 samples of volume avg_latency 10 seconds apart using the method above and the number very closely matches the CLI "stats show" output (volume latency is in microseconds).


I think gheppner's method is a lot easier. Though my math is rusty I think his method is also mathematically correct. To try to verify I spent a couple of minutes trying it in oocalc, replicating the math sugested my netapp and what you get using an rrd and gheppner's suggestion, and it absolutely seems to yield the correct numbers.

I'm totally new to cacti and this forum btw. Been using munin to create some graphs for my filers but now I'm trying cacti because I think it would work and look much nicer. :)


Top
 Profile  
 
 Post subject:
PostPosted: Tue Mar 31, 2009 9:35 am 
Offline

Joined: Mon Mar 30, 2009 4:58 pm
Posts: 3
markdv wrote:
I think gheppner's method is a lot easier. Though my math is rusty I think his method is also mathematically correct. To try to verify I spent a couple of minutes trying it in oocalc, replicating the math sugested my netapp and what you get using an rrd and gheppner's suggestion, and it absolutely seems to yield the correct numbers.


Yeah, you're right. After I thought about it some more, since Cacti is treating this as a counter it basically does the subtraction between intervals for the calculation so doing the CDEF method is much simpler than what I was contemplating.

I abandoned my idea and did what gheppner suggested and the numbers look good (although, as I indicated, volume latency is indeed in microseconds).


Top
 Profile  
 
 Post subject:
PostPosted: Wed Apr 01, 2009 10:58 am 
Offline

Joined: Thu Dec 04, 2008 5:10 pm
Posts: 20
jlindberg wrote:
markdv wrote:
I think gheppner's method is a lot easier. Though my math is rusty I think his method is also mathematically correct. To try to verify I spent a couple of minutes trying it in oocalc, replicating the math sugested my netapp and what you get using an rrd and gheppner's suggestion, and it absolutely seems to yield the correct numbers.


Yeah, you're right. After I thought about it some more, since Cacti is treating this as a counter it basically does the subtraction between intervals for the calculation so doing the CDEF method is much simpler than what I was contemplating.

I abandoned my idea and did what gheppner suggested and the numbers look good (although, as I indicated, volume latency is indeed in microseconds).


Curiuos how you determined volume latency was in microseconds. If I pass "volume counter-list" to the perl script, it returns the units as milliseconds also:

netapp-ontapsdk-perf.pl myfilerhead "username-ommited" 'password-ommited' volume counter-list

Counter Name = avg_latency Base Counter = total_ops Privilege_level = basic Unit = millisec
Counter Name = total_ops Base Counter = none Privilege_level = basic Unit = per_sec
Counter Name = read_data Base Counter = none Privilege_level = basic Unit = b_per_sec
Counter Name = read_latency Base Counter = read_ops Privilege_level = basic Unit = millisec
Counter Name = read_ops Base Counter = none Privilege_level = basic Unit = per_sec
Counter Name = write_data Base Counter = none Privilege_level = basic Unit = b_per_sec
Counter Name = write_latency Base Counter = write_ops Privilege_level = basic Unit = millisec


Top
 Profile  
 
 Post subject:
PostPosted: Mon Apr 27, 2009 1:20 pm 
Offline

Joined: Mon Mar 30, 2009 4:58 pm
Posts: 3
gheppner wrote:
Curiuos how you determined volume latency was in microseconds. If I pass "volume counter-list" to the perl script, it returns the units as milliseconds also:

Hi again...

The "Unified Storage Performance Management Using Open Interfaces" design guide (3/7/2008 page 117) which I was originally using to work on the graph says that avg_latency, read_latency and write_latency units are in "USECS".

Further, comparing the numbers I was seeing from the poll against "stats show ... volume" (also in microseconds) confirmed the documentation.

Over the past several weeks I've been graphing volume latency data, the graph tracks with "stats show ... volume" data.

Put another way, if it really IS milliseconds, our reponse time is sucking badly at 4,000 mS rather than 4,000 uS! :-)

I just though to try the netapp-ontapsdk-perf.pl query that you did and here's my results.... not sure why mine is different from yours.
Code:
Counter Name = avg_latency   Base Counter = total_ops Privilege_level = basic Unit = microsec
Counter Name = total_ops     Base Counter = none      Privilege_level = basic Unit = per_sec
Counter Name = read_data     Base Counter = none      Privilege_level = basic Unit = b_per_sec
Counter Name = read_latency  Base Counter = read_ops  Privilege_level = basic Unit = microsec
Counter Name = read_ops      Base Counter = none      Privilege_level = basic Unit = per_sec
Counter Name = write_data    Base Counter = none      Privilege_level = basic Unit = b_per_sec
Counter Name = write_latency Base Counter = write_ops Privilege_level = basic Unit = microsec
Counter Name = write_ops     Base Counter = none      Privilege_level = basic Unit = per_sec
Counter Name = other_latency Base Counter = other_ops Privilege_level = basic Unit = microsec


Top
 Profile  
 
 Post subject: Data query returns 0 Rows
PostPosted: Tue Feb 16, 2010 4:07 pm 
Offline

Joined: Thu Dec 11, 2008 11:00 am
Posts: 4
I'm trying to figure out what i'm doing wrong. When I run the script manually, everything works great but when i try to create new graphs for my filer, it show "This data query returned 0 rows" and when i run it in debug mode i get the following:

+ Running data query [16].
+ Found type = '4 '[script query].
+ Found data query XML file at '/usr/share/cacti/site/resource/script_queries/query-netapp-ontapsdk-system.xml'
+ XML file parsed ok.
+ Executing script for list of indexes 'perl /usr/share/cacti/site/scripts/netapp-ontapsdk-perf.pl fasprs02 "USERNAME" "PASSWORD" system index'
+ Executing script query 'perl /usr/share/cacti/site/scripts/netapp-ontapsdk-perf.pl fasprs02 "USERNAME" "PASSWORD" system query index'
+ Found data query XML file at '/usr/share/cacti/site/resource/script_queries/query-netapp-ontapsdk-system.xml'
+ Found data query XML file at '/usr/share/cacti/site/resource/script_queries/query-netapp-ontapsdk-system.xml'
+ Found data query XML file at '/usr/share/cacti/site/resource/script_queries/query-netapp-ontapsdk-system.xml'


Any thoughts on what i might be doing wrong? This is a brans new setup as well. Let me know if you need more information.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Feb 16, 2010 6:20 pm 
Offline
Cacti User

Joined: Mon Dec 13, 2004 3:03 pm
Posts: 232
gheppner wrote:
... Ok, after some additional investigation I've concluded the following:

1) the units returned by the API are in milliseconds, not microseconds.
2) the value returned by a call to avg_latency is not representative of the average latency per operation, but the avg latency of the total ops in a given polling period.

I added total_ops as a data source to the lun latency graph template, and then used a CDEF to divide the latency by the total ops. I now get values in the 3 - 8 ms range that are consistent with what the filer shows with lun stats -o -i 5 <lun name>.

I'm curiuos if anyone else using these templates has noticed what I've noticed, or if I'm way out in left field here.


@gheppner:

Wow, I've been running these templates for months and had no idea the volume latencies were off by so much. Thanks for tracking this issue down. I only partially understand what you've done here, mostly because I haven't looked at this template in a long time... Is there a chance you can roll a new version of this template, or at least post some updated xml's to reflect the changes you've made? I'm also wondering how this will work against the old templates and RRDs I already have running.

Thanks!


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 67 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next

All times are UTC - 5 hours


Who is online

Users browsing this forum: No registered users and 6 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  

Protected by Anti-Spam ACP Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group