NetApp Filer: graphing Performance Stats and IO's (template)

Templates, scripts for templates, scripts and requests for templates.

Moderators: Moderators, Developers

Author
Message
pierre-luc
Posts: 8
Joined: Wed Aug 10, 2005 10:05 pm
Location: Montreal, Canada

NetApp Filer: graphing Performance Stats and IO's (template)

#1 Post by pierre-luc » Mon Jun 02, 2008 11:45 am

Hello,
Here is the host template and scripts I did this to graph storage performance for Netapp Filer using Manage Ontap SDK 3.0: Perl API.

graph list:
- LUN: IOPS, Latency, data throuput
- Volume: IOPS, Latency
- Target interfaces: IOPS
- filer total IOPS per protocols (FC/iSCSI/nfs/cifs/...)

See screenshot.

With the host template of Network-Appliance using SNMPv1 available on this forum, Gathering NetApp SAN performance statistics with Cacti is quite complete.

Requirements:
- Manage OnTap SDK 3.0 perl api install on cacti host
- Netapp Filer: http enable

tested on cacti Version 0.8.7b
Attachments
netapp_OnTap_graph.jpg
graph sample of Netapp-ontapsdk template.
netapp_OnTap_graph.jpg (321.02 KiB) Viewed 59521 times
NetApp_OnTap-SDK_cacti-20080602.tgz
cacti template, scripts and .xml files
(256.95 KiB) Downloaded 5496 times

pflaherty
Posts: 5
Joined: Mon Mar 17, 2008 3:06 pm

Good work

#2 Post by pflaherty » Mon Jun 02, 2008 2:41 pm

I've been playing with the SDK for a few weeks and had a half working implementation of this when I saw your post. Templates all installed with no trouble. Everything seems to be graphing correclty. I wanted to make a dig at the color scheme, but it's growing on me -=]

Awesome work, you saved me a ton of time.

wazoqaz
Posts: 15
Joined: Wed May 24, 2006 9:40 am
Location: md, us
Contact:

#3 Post by wazoqaz » Thu Jun 05, 2008 7:18 am

Beautiful!! This was the first that I'd heard of the SDK. The installation was simple and the result are great. It is nice to have another view of what is going on inside my filer.

Just a question, would it make sense to repalce the other NetApp graphs done via SNMP with similar ones done via the SDK?

Thanks for your hard work.

evilensky
Posts: 1
Joined: Thu Jun 05, 2008 2:45 pm

#4 Post by evilensky » Thu Jun 05, 2008 2:50 pm

This looks great. Thank you!

User avatar
o_dupuis
Posts: 18
Joined: Fri Mar 18, 2005 12:55 pm
Location: Paris/France

Problem

#5 Post by o_dupuis » Wed Jun 11, 2008 6:28 am

Hi,
those template looks great but I must confess that I couldn't make them work.

First I discovered that with cactid the full perl path should be provided in the xml files (query-netapp-ontapsdk-lun.xml..)

I fixed this but I keep getting a partial results error :

Code: Select all

06/11/2008 12:22:05 AM - CACTID: Poller[0] Host[42] DS[2160] WARNING: Result from SCRIPT not valid. Partial Result: ...
06/11/2008 12:22:05 AM - CACTID: Poller[0] Host[42] DS[2160] SCRIPT: /usr/bin/perl /opt/apache/php/cacti-0.8.7b/scripts/netapp-ontapsdk-perf.pl 10.12.2.3 "cacti" "cacti2008" volume get avg_latency voloracle_1_archive, output: U
06/11/2008 12:22:05 AM - CACTID: Poller[0] Host[42] DEBUG: The POPEN returned the following File Descriptor 16
06/11/2008 12:22:05 AM - CACTID: Poller[0] Host[42] ERROR: Empty result [10.12.2.3]: '/usr/bin/perl /opt/apache/php/cacti-0.8.7b/scripts/netapp-ontapsdk-perf.pl 10.12.2.3 "cacti" "cacti2008" volume get avg_latency voloracle_1_data'
but if I run the script manually I get the correct answer :

Code: Select all

# /usr/bin/perl /opt/apache/php/cacti-0.8.7b/scripts/netapp-ontapsdk-perf.pl 10.12.2.3 "cacti" "cacti2008" volume get avg_latency voloracle_1_data
8363761585
#

Thx for any help,

Olivier

[email protected]
Posts: 6
Joined: Wed Jan 25, 2006 7:06 pm

Cool NetAPP Query

#6 Post by [email protected] » Wed Jun 11, 2008 11:24 am

Hi,

This looks cool, I have got the attached template working - thank you this great use of the SDK (very little known of in many circles of NetAPP apparently speaking to one of there SE) however it does not seem to include a lot of the graphs I can see from you pics the script queries systems etc seem to be present. but only get the following graphs when I use the host template.

I could quite easily concede I am missing something

All Nics+
cache age
CIFS Ops
CPU % Busy
NFS Ops

Many thanks for posting this info, its nice to be able to put NetAPP Performance in our common dash board i.e. Cacti and not just use DFM OppMan.


Kind regards,

Mark Kaye

[email protected]
Posts: 6
Joined: Wed Jan 25, 2006 7:06 pm

plz Ignore previous post - user error (mine)

#7 Post by [email protected] » Wed Jun 11, 2008 11:39 am

Sorry
Mark

pierre-luc
Posts: 8
Joined: Wed Aug 10, 2005 10:05 pm
Location: Montreal, Canada

#8 Post by pierre-luc » Mon Jun 16, 2008 1:00 pm

Hi Mark,
Regarding missing graph, I would say that regarding protocol specific graph, it can be very easy to add since NFS and CIFS IOPS are provide for the "Per Protocol" graph.

All nics and cache age, for now we are monitoring them using SNMP and another cacti template provide somewhere in this forums.

It could be a good idea to add these feature in the SDK template and used only the SDK to gather stats... future project...


Thanks all for your comments.

eschoeller
Cacti User
Posts: 234
Joined: Mon Dec 13, 2004 3:03 pm

#9 Post by eschoeller » Tue Aug 05, 2008 12:22 pm

This is great data! It seems to come with a heavy cost for us however. Is anyone else noticing severe performance issues after using this template?

I initially added over 300 data sources using this template in my development environment. It ran fine for a little while, until I noticed that performance had degraded so badly that my poller was timing out. Before adding these data sources my poller runtime was around 15s, then it was timing out after 56s.

I trimmed this down to only 60 data sources, but I'm still seeing terrible performance. I am using a 1 minute poller so I don't have much flexibility to run a polling interval longer than 45s. I attached several charts to indicate the issues I am seeing.

This is running on a Dell Precision 450 desktop. It only has one CPU and one disk, so take that into account, but please don't completely blame it on the hardware. If I had added an additional 60 SNMP data sources I would have never seen such performance loss.

I've tried tweaking #threads, #processes, #script servers but it hasn't improved anything. I am already running the latest version of spine. From the logs I can always see that the netapp-ontapsdk-perf.pl script is running towards the end of the polling cycle, so I know that's what is prolonging the runtime.

Running the netapp-ontapsdk-perf.pl by hand while my poller isn't running usually takes about half a second. Then, while the poller is running it can take as long as 4-5 seconds to run. This leads me to believe it's possibly a system issue.

But why is this data collection script so resource intensive?



I have also noticed that on many of the context menus, the Netapp graphs show up first in the list and not in alphabetical order, but this is probably an entirely different issue.
Attachments
cpu.png
cpu usage of cacti host before and after using this template
cpu.png (31.34 KiB) Viewed 57556 times
load.png
load of cacti host before and after using this template
load.png (41.68 KiB) Viewed 57556 times
objects.png
number of objects before and after using this template
objects.png (28.29 KiB) Viewed 57556 times
runtime.png
poller runtime before and after using this template
runtime.png (24.92 KiB) Viewed 57556 times

eschoeller
Cacti User
Posts: 234
Joined: Mon Dec 13, 2004 3:03 pm

#10 Post by eschoeller » Thu Aug 07, 2008 11:15 am

I also noticed that Interface traffic went through the roof as well. It seems that this script is pulling in LOTS of data and doing some sort of computational work to come up with the figures it needs. I haven't had a chance to look closely at the script to see if there is any room for optimization.
Attachments
eth0.png
interface traffic before and after using this template
eth0.png (26.15 KiB) Viewed 57444 times

pierre-luc
Posts: 8
Joined: Wed Aug 10, 2005 10:05 pm
Location: Montreal, Canada

#11 Post by pierre-luc » Fri Aug 08, 2008 9:17 am

Hello eschoeller,

Yes, the script netapp-ontapsdk-perf.pl is not optimized! I got a bug with the Manage-OnTap-SDK while I was devellopping the template, the SDK was unable to return a specific value for a specific object (example: query avg_latency for a LUN). So, The actual API that work is to query all LUN for the avg_latency and than grab the selected one. This mean that if there is 300 LUN into your Filer, the API will return 300 value to the netapp-ontapsdk-perf.pl script. Here is the HUGE over head of this template.

There is a very small thread on NetApp forum regarding this issue: http://communities.netapp.com/thread/1405?tstart=0

So because the API "perf-object-get-instances" wasn't working I used the
API "perf-object-get-instances-iter-*" it almost like querying the universe to grab a mosquito.

I hope future release of SDK will fix this issue so it would improve performance...


Regarding the graph sorting, I didn't try something to do the sorting in alphabetic order. the actual sorting is based on the index provide by the API which is by objects creation date. I'm not sure if changing the sorting index in the query-netapp-ontapsdk-*.xml files would fix it or create another issue???

P-L

eschoeller
Cacti User
Posts: 234
Joined: Mon Dec 13, 2004 3:03 pm

#12 Post by eschoeller » Fri Aug 15, 2008 1:14 pm

I read the short post mentioned above. I have another member on our team looking into optimizing the code. In the meantime we upgraded to a Dell 2950 quad core 3ghz xeon with 8 GB of ram, 4 column RAID 10 disk.

Here are the performance metrics of the cacti server before and after the upgrade in case anyone is interested.

But, Long story short, these templates will work OK with a fast enough system, Despite the fact that there is a lot of room for performance improvements. I still have around 750 Data sources and 550 RRDs.
Attachments
cpu.png
CPU usage before and after upgrade
cpu.png (32.52 KiB) Viewed 57124 times
load.png
Load before and after upgrade
load.png (33.14 KiB) Viewed 57124 times
runtime.png
poller runtime before and after upgrade.
runtime.png (22.68 KiB) Viewed 57124 times

kkoduru
Posts: 7
Joined: Wed Jun 25, 2008 3:48 pm

Not discovering objects

#13 Post by kkoduru » Wed Sep 17, 2008 6:03 pm

Hi Gurus

I was able to get the ontap sdk and import the template. The first issue I faced was with perl where it complains about "\N" and I had to give the entire path with double-backslashes
use lib "C:\\manage-ontap-sdk-1.6\\lib\\perl\\NetApp"

Now, when i discover the filer, it cannot find any objects with the below message.
This data query returned 0 rows, perhaps there was a problem executing this data query. You can run this data query in debug mode to get more information.

Upon running in verbose mode, below is the output

+ Running data query [17].
+ Found type = '4 '[script query].
+ Found data query XML file at 'C:/Inetpub/wwwroot/cacti/resource/script_queries/query-netapp-ontapsdk-system.xml'
+ XML file parsed ok.
+ Executing script for list of indexes 'perl C:\Inetpub\wwwroot\cacti\scripts\netapp-ontapsdk-perf.pl nfiler2.rws.ad.ea.com "xxxx" "xxxx" system index'
+ Executing script query 'perl C:\Inetpub\wwwroot\cacti\scripts\netapp-ontapsdk-perf.pl nfiler2.rws.ad.ea.com "xxxx" "xxxx" system query index'
+ Found data query XML file at 'C:/Inetpub/wwwroot/cacti/resource/script_queries/query-netapp-ontapsdk-system.xml'
+ Found data query XML file at 'C:/Inetpub/wwwroot/cacti/resource/script_queries/query-netapp-ontapsdk-system.xml'
+ Found data query XML file at 'C:/Inetpub/wwwroot/cacti/resource/script_queries/query-netapp-ontapsdk-system.xml'

If i run the script manually, it works just fine returning the information that is expected to see..the lun info etc.

Could you please point me where I am doing wrong?

thanks in advance
KK
Last edited by kkoduru on Thu Sep 18, 2008 3:58 pm, edited 2 times in total.

eschoeller
Cacti User
Posts: 234
Joined: Mon Dec 13, 2004 3:03 pm

#14 Post by eschoeller » Wed Sep 17, 2008 7:21 pm

This is what mine looks like:

Code: Select all

+ Running data query [14].
+ Found type = '4 '[script query].
+ Found data query XML file at '/usr/local/cacti-0.8.7b/resource/script_queries/query-netapp-ontapsdk-system.xml'
+ XML file parsed ok.
+ Executing script for list of indexes 'perl /usr/local/cacti-0.8.7b/scripts/netapp-ontapsdk-perf.pl 255.255.255.123 "USER" "PASSWORD" system index'
+ Executing script query 'perl /usr/local/cacti-0.8.7b/scripts/netapp-ontapsdk-perf.pl 255.255.255.123 "USER" "PASSWORD" system query index'
+ Found item [index='system'] index: system
+ Found data query XML file at '/usr/local/cacti-0.8.7b/resource/script_queries/query-netapp-ontapsdk-system.xml'
+ Found data query XML file at '/usr/local/cacti-0.8.7b/resource/script_queries/query-netapp-ontapsdk-system.xml'
+ Found data query XML file at '/usr/local/cacti-0.8.7b/resource/script_queries/query-netapp-ontapsdk-system.xml'
+ Found data query XML file at '/usr/local/cacti-0.8.7b/resource/script_queries/query-netapp-ontapsdk-system.xml'
I have 4 lines of data query XML, you only have 3. Another thing, since you're using windows, you may have to specify the full path to your perl binary.

Hope this helps!

kkoduru
Posts: 7
Joined: Wed Jun 25, 2008 3:48 pm

#15 Post by kkoduru » Thu Sep 18, 2008 1:02 pm

I am not sure what/where the 3-line vs 4-line output is controlled from as it should be coming from the script itself. Also, the perl path is embedded into the cacti installer; i couldn't find a way to change it.

Post Reply