Cisco cpus and memory pools -- update November 11, 2010

Templates, scripts for templates, scripts and requests for templates.

Moderators: Moderators, Developers

Author
Message
_CL
Posts: 39
Joined: Mon Feb 25, 2008 5:01 pm
Location: USA

#46 Post by _CL » Mon Mar 10, 2008 12:59 am

_CL wrote:
looc wrote:total have error :cry:
I have two Cacti installations on Ubuntu 7.10. With 0.8.6j, I don't see the Total error. With 0.8.7b, I do see it.

I couldn't figure out the cause of the problem with 0.8.7b, but I did notice that the erroneous Total was twice the Used value. On a hunch, I created a CDEF that added a and c (instead of a and b). Bingo. It worked. I have no idea, but it's good enough for me now.

I named my new CDEF function "Add A to C (Hack for Cisco Router Memory Usage in place of Add A to B)". The CDEF is simply cdef=a,c,+

On my 0.8.7b installation.....

RRDtool 1.2.19
MySQL 5.0.45
Apache 2.2.4
PHP 5.2.3
I also had to correct an error in the original graph template. The contiguous (PoolLargest) had the Graph Item Type set to STACK. I changed it to LINE1.
CL

_CL
Posts: 39
Joined: Mon Feb 25, 2008 5:01 pm
Location: USA

Re: Cisco cpus and memory pools -- update Feb 10, 2007

#47 Post by _CL » Mon Mar 10, 2008 7:38 pm

ehall wrote:Cisco routers with IOS 12.1 and up have indexed SNMP tables for CPUs and memory pools. I've created the appropriate queries and graphs for these, to allow for tracking the utilization of individual CPUs and memory pools separately.
I applied the CPU template to a Cisco uBR10K and ran into an error on line 277 in cisco_cpu_usage.xml. This prevented me from geting the cpuDesc. I changed line 277 from

Code: Select all

			$oid[1][1] . $cmdoutput[$cmdoutputcount][1],
to

Code: Select all

			$oid[1][1] . "." . $cmdoutput[$cmdoutputcount][1],
The cpu usage data query is now working on ubunti 0.8.6j with PHP5. I have it graphing 8 CPUs on that one router.

I am now going to try it on 0.8.7b.
CL

_CL
Posts: 39
Joined: Mon Feb 25, 2008 5:01 pm
Location: USA

Re: Cisco cpus and memory pools -- update Feb 10, 2007

#48 Post by _CL » Mon Mar 10, 2008 10:17 pm

_CL wrote:I am now going to try it on 0.8.7b.
I have multiple CPUs being graphed on a Cisco uBR10K with

Cacti 0.8.7b
Ubuntu 7.10
RRDtool 1.2.19
MySQL 5.0.45
Apache 2.2.4
PHP 5.2.3

I will post graphs tomorrow after there's a day's worth of data to fill up the graph.

I had to make a few changes cisco_cpu_usage.script to get it to work with 0.8.7b. After I did the changes, I looked at skinty's update - we had found the same things that needed to be changed. Too bad I didn't look at skinty's update before I spent a couple hours on it! Anyhow, I will post the updated script (my bug fix + skinty's same changes) tomorrow with the graphs.
CL

_CL
Posts: 39
Joined: Mon Feb 25, 2008 5:01 pm
Location: USA

cisco_cpu_usage script for 0.8.7.b

#49 Post by _CL » Thu Mar 13, 2008 11:01 am

Attached is the exported data query and dependencies for ehall's "Cisco Router - CPU Usage" graphs with one bug fix and updates for 0.8.7.b. For installation, follow ehall's instructions in the first post in this thread.

Also attached are two sets of sample graphs, one for a Cisco uBR10K running 12.3(17b)BC9 and one for a Cisco 7600 running Version 12.2(18)SXF12, both from a lab environment. The uBR is idle right now. The 7600 is our Internet gateway pulling full routes plus 2.5G mcast video to be distributed to our test beds.
Attachments
cacti_data_query_cisco_router_-_cpu_statistics.zip
(2.72 KiB) Downloaded 259 times
Multiple CPUs on 7600.png
Multiple CPUs on 7600.png (109.62 KiB) Viewed 6127 times
Multiple CPUs on uBR10K.png
Multiple CPUs on uBR10K.png (84.35 KiB) Viewed 6127 times
CL

_CL
Posts: 39
Joined: Mon Feb 25, 2008 5:01 pm
Location: USA

Cisco Enhanced Memory Pools MIB

#50 Post by _CL » Mon Mar 17, 2008 10:27 pm

I created a script data query to retrieve memory pools on each CPU.

http://forums.cacti.net/viewtopic.php?t=26269
CL

zivley
Cacti User
Posts: 69
Joined: Tue Nov 13, 2007 6:22 am

#51 Post by zivley » Tue Jun 03, 2008 6:08 am

Hey, those are great templates!!
I only have a little problem, I'm sampling a 7206VXR NPE-G1 and everythign looks right except the cpu description, it retrieves:
"No Such Object available on this agent at this OID"
Here's a screenshot of the device
Anyone can help?
What is the exact OID used in this template? perhaps I can do an snmpwalk and see what my router says?
Thanks
Ziv
Attachments
cisco-cpu.png
Device Settings Screenshot
cisco-cpu.png (24.77 KiB) Viewed 5741 times

_CL
Posts: 39
Joined: Mon Feb 25, 2008 5:01 pm
Location: USA

#52 Post by _CL » Tue Jun 03, 2008 8:13 am

zivley wrote:Hey, those are great templates!!
I only have a little problem, I'm sampling a 7206VXR NPE-G1 and everythign looks right except the cpu description, it retrieves:
"No Such Object available on this agent at this OID"
Here's a screenshot of the device
Anyone can help?
What is the exact OID used in this template? perhaps I can do an snmpwalk and see what my router says?
Thanks
Ziv
.1.3.6.1.2.1.47.1.1.1.1.7 aka entPhysicalName in the ENTITY-MIB. It is an indexed value, so you need to walk it. Here is an example from a 7600 in my lab.

snmpwalk -v2c -c public 10.10.253.11 .1.3.6.1.2.1.47.1.1.1.1.7 | grep CPU
ENTITY-MIB::entPhysicalName.1007 = STRING: CPU of Sub-Module 2 DFC Card
ENTITY-MIB::entPhysicalName.2001 = STRING: CPU of Switching Processor 5
ENTITY-MIB::entPhysicalName.2017 = STRING: CPU of Routing Processor 5
ENTITY-MIB::entPhysicalName.3001 = STRING: CPU of Switching Processor 6
ENTITY-MIB::entPhysicalName.3017 = STRING: CPU of Routing Processor 6
ENTITY-MIB::entPhysicalName.4008 = STRING: CPU of Sub-Module 7 DFC Card

Here is a little explanation taken from the comments in a new script I am testing. It does the same as ehall's script except that it is based off the entPhysicalName. I am in a lab environment and we swap cards alots, so the CPU indeces change. My script will allow us to set up the graphs and leave them alone. It should also be fine for a production environment. I will post the templates after I finish testing it.

# cpmCPUTotalTable
# ...cpmCPUTotalEntry
# ......cpmCPUTotalIndex (1) <== CPU index
# ......cpmCPUTotalPhysicalIndex (2)
# ......cpmCPUTotal5secRev (6) <== CPU stats
# ......cpmCPUTotal1minRev (7) <==
# ......cpmCPUTotal5minRev (8) <==
#
# fortunately, the name of each CPU appears to be static, so we can use it
# the name is tied to cpmCPUTotalIndex by way of cpmCPUTotalPhysicalIndex
#
# cpmCPUTotalPhysicalIndex.cpmCPUTotalIndex = entPhysicalIndex
#
# entPhysicalTable
# ...entPhysicalEntry
# ......entPhysicalIndex (1)
# ......entPhysicalName (7)
#
# CPU name = entPhysicalName.cpmCPUtotalPhysicalIndex
CL

zivley
Cacti User
Posts: 69
Joined: Tue Nov 13, 2007 6:22 am

#53 Post by zivley » Tue Jun 03, 2008 8:23 am

That will be great indeed!
In the mean time, I'm not so worried about the description, so I've found I can fix it just by removing the |query_cpuDesc| from the graph template title
Is not a big problem, all my routers have only 1 CPU and even if they had more, the query_cpuName is enough for me, I think is enough by having them called CPU1, CPU2 and so on, isn't it?
Anyway, I like, as you say, to have a template you create once and let it alone, and it always works, even in the future, so I'll looking forward for your post of the new scripts.
Anyway, I've made a few graphs and they seem to work, but with some issues, please take a look at the screenshots, the mem graphs look weird, the values of the processor pool look ok, but both i/o and transient doesn't make sense, look at the total value, it's smaller than the sum of the others!
Also, the GPRINTS seem to get off the graph, I can't see the Maximum: values on every line, this happens on all of the memoru graphs.

Thank you all for the great job you do!
Ziv
Attachments
iomem-graph.png
I/O pool graph
iomem-graph.png (60.43 KiB) Viewed 5712 times
transient-mem-graph.png
Transient pool graph
transient-mem-graph.png (49.66 KiB) Viewed 5712 times

_CL
Posts: 39
Joined: Mon Feb 25, 2008 5:01 pm
Location: USA

#54 Post by _CL » Tue Jun 03, 2008 4:38 pm

zivley wrote:That will be great indeed!
In the mean time, I'm not so worried about the description, so I've found I can fix it just by removing the |query_cpuDesc| from the graph template title
Is not a big problem, all my routers have only 1 CPU and even if they had more, the query_cpuName is enough for me, I think is enough by having them called CPU1, CPU2 and so on, isn't it?
Anyway, I like, as you say, to have a template you create once and let it alone, and it always works, even in the future, so I'll looking forward for your post of the new scripts.
Anyway, I've made a few graphs and they seem to work, but with some issues, please take a look at the screenshots, the mem graphs look weird, the values of the processor pool look ok, but both i/o and transient doesn't make sense, look at the total value, it's smaller than the sum of the others!
Also, the GPRINTS seem to get off the graph, I can't see the Maximum: values on every line, this happens on all of the memoru graphs.

Thank you all for the great job you do!
Ziv
If you call them CPU1 , CPU2..., what does the number refer to? It can't be cpmCPUTotalIndex because the MIB says its value is NOT persistent. (In practice, I have seen it persistent on the ubr10K, but on the 7600, even a simple processor failover caused cpmCPUTotalIndex to change for linecards.)

As far as your graphs, for the Total value, look for my fix posted earlier in this thread (http://forums.cacti.net/viewtopic.php?p ... ht=#129105). For the Maximum problem, you can either go into the graph template and abbreviate the terms Minimum, Average, and Maximum or you can go into the Cacti settings on the Visual tab and change the font size.
CL

zivley
Cacti User
Posts: 69
Joined: Tue Nov 13, 2007 6:22 am

#55 Post by zivley » Wed Jun 04, 2008 2:13 am

Hi
As for the memory graphs and the total problem, I tried your solution and it solved the problem indeed!

Regarding the CPU, I'm not worried about the ID of the CPU, as I said I only have routers with a single CPU so there's no place for confusion, no matter how the router wants to call it. Anyway, do you have an idea what could solve that CPU query problem?
I did a snmpwalk from the server as you proposed and this is what I've got:

Code: Select all

snmpwalk -v2c -c public 10.0.0.1 .1.3.6.1.2.1.47.1.1.1.1.7 | grep CPU:
SNMPv2-SMI::mib-2.47.1.1.1.1.7.2 = STRING: "I/O and CPU Slot 0"
SNMPv2-SMI::mib-2.47.1.1.1.1.7.19 = STRING: "Flash Card Slot Container CPU"
I've found a workaround though, but it still disturbes me, I don't like to see errors on the queries! Even if they don't appear on my graph right now.
Also I still can't manage to understand why it shows the black line like this (see attached screenshot)
so I removed the last item, no black line now, but I'd like to know what should I set for it to display correctly, according to the graph code.
Here's the debug code (without the black line)

Code: Select all

RRDTool Command:

/usr/bin/rrdtool graph - \
--imgformat=PNG \
--start=-86400 \
--end=-60 \
--title="Cisco 7200 template test 2 - CPU Usage - CPU1" \
--rigid \
--base=1000 \
--height=120 \
--width=500 \
--alt-autoscale-max \
--lower-limit=0 \
--vertical-label="Percent" \
--slope-mode \
--font TITLE:12: \
--font AXIS:8: \
--font LEGEND:8: \
--font UNIT:8: \
DEF:a="/var/lib/cacti/rra/cisco_7200_template_test_2_fivemin_1568.rrd":oneMin:AVERAGE \
DEF:b="/var/lib/cacti/rra/cisco_7200_template_test_2_fivemin_1568.rrd":oneMin:MAX \
DEF:c="/var/lib/cacti/rra/cisco_7200_template_test_2_fivemin_1568.rrd":fiveMin:AVERAGE \
DEF:d="/var/lib/cacti/rra/cisco_7200_template_test_2_fivemin_1568.rrd":fiveMin:MAX \
LINE1:a#0000FFFF:"1 Min Avg"  \
GPRINT:a:LAST:"Current\:%8.0lf"  \
GPRINT:a:AVERAGE:"Average\:%8.0lf"  \
GPRINT:b:MAX:"Maximum\:%8.0lf\n"  \
AREA:c#96E78AFF:"5 Min Avg"  \
GPRINT:c:LAST:"Current\:%8.0lf"  \
GPRINT:c:AVERAGE:"Average\:%8.0lf"  \
GPRINT:d:MAX:"Maximum\:%8.0lf\n" 
RRDTool Says:

OK
Please note that this code and the attached graph are't the same, the graph contains the line, the debug code is without it!

Thanks,
Ziv
Attachments
cpu1-graph.png
cpu1-graph.png (26.95 KiB) Viewed 5690 times

_CL
Posts: 39
Joined: Mon Feb 25, 2008 5:01 pm
Location: USA

#56 Post by _CL » Wed Jun 04, 2008 3:23 pm

zivley wrote:Hi
Anyway, do you have an idea what could solve that CPU query problem?
I am not sure what to tell you about this....Have you run a verbose query yet? If not, try it. Got to the Device page and down in the Data Queries section, click on the Verbose Query link next to the CPU data query. If that doesn't fix it, copy/paste the verbose query output here.
CL

_CL
Posts: 39
Joined: Mon Feb 25, 2008 5:01 pm
Location: USA

#57 Post by _CL » Wed Jun 04, 2008 4:07 pm

_CL wrote: Here is a little explanation taken from the comments in a new script I am testing.
It's available now.

http://forums.cacti.net/viewtopic.php?p=136993
CL

zivley
Cacti User
Posts: 69
Joined: Tue Nov 13, 2007 6:22 am

#58 Post by zivley » Thu Jun 05, 2008 1:45 am

_CL wrote:
zivley wrote:Hi
Anyway, do you have an idea what could solve that CPU query problem?
I am not sure what to tell you about this....Have you run a verbose query yet? If not, try it. Got to the Device page and down in the Data Queries section, click on the Verbose Query link next to the CPU data query. If that doesn't fix it, copy/paste the verbose query output here.
I just took one of the routers as an example, I haev a few with the same problem, and ran a verbose query, here's the output:

Code: Select all

+ Running data query [12].
+ Found type = '4 '[script query].
+ Found data query XML file at '/usr/share/cacti/site/resource/script_queries/cisco_cpu.xml'
+ XML file parsed ok.
+ Executing script for list of indexes '/usr/bin/php -q /usr/share/cacti/site/scripts/cisco_cpu_usage.php 10.0.0.1, public, 2, , , 161, 500 index'
+ Executing script query '/usr/bin/php -q /usr/share/cacti/site/scripts/cisco_cpu_usage.php 10.0.0.1, public, 2, , , 161, 500 query cpuIndex'
+ Found item [cpuIndex='1'] index: 1
+ Executing script query '/usr/bin/php -q /usr/share/cacti/site/scripts/cisco_cpu_usage.php 10.0.0.1, public, 2, , , 161, 500 query cpuName'
+ Found item [cpuName='CPU1'] index: 1
+ Executing script query '/usr/bin/php -q /usr/share/cacti/site/scripts/cisco_cpu_usage.php 10.0.0.1, public, 2, , , 161, 500 query cpuDesc'
+ Found item [cpuDesc='No Such Object available on this agent at this OID'] index: 1
+ Found data query XML file at '/usr/share/cacti/site/resource/script_queries/cisco_cpu.xml'
+ Found data query XML file at '/usr/share/cacti/site/resource/script_queries/cisco_cpu.xml'
+ Found data query XML file at '/usr/share/cacti/site/resource/script_queries/cisco_cpu.xml'
+ Found data query XML file at '/usr/share/cacti/site/resource/script_queries/cisco_cpu.xml'
Pay attention to this line:

Code: Select all

+ Found item [cpuDesc='No Such Object available on this agent at this OID'] index: 1 
As for your new script, will it solve my problem or is it for other purposes?

And what about the black line of the last item?

Thanks,
Ziv

_CL
Posts: 39
Joined: Mon Feb 25, 2008 5:01 pm
Location: USA

#59 Post by _CL » Thu Jun 05, 2008 12:19 pm

zivley wrote: I just took one of the routers as an example, I haev a few with the same problem, and ran a verbose query, here's the output:

Code: Select all

+ Running data query [12].
+ Found type = '4 '[script query].
+ Found data query XML file at '/usr/share/cacti/site/resource/script_queries/cisco_cpu.xml'
+ XML file parsed ok.
+ Executing script for list of indexes '/usr/bin/php -q /usr/share/cacti/site/scripts/cisco_cpu_usage.php 10.0.0.1, public, 2, , , 161, 500 index'
+ Executing script query '/usr/bin/php -q /usr/share/cacti/site/scripts/cisco_cpu_usage.php 10.0.0.1, public, 2, , , 161, 500 query cpuIndex'
+ Found item [cpuIndex='1'] index: 1
+ Executing script query '/usr/bin/php -q /usr/share/cacti/site/scripts/cisco_cpu_usage.php 10.0.0.1, public, 2, , , 161, 500 query cpuName'
+ Found item [cpuName='CPU1'] index: 1
+ Executing script query '/usr/bin/php -q /usr/share/cacti/site/scripts/cisco_cpu_usage.php 10.0.0.1, public, 2, , , 161, 500 query cpuDesc'
+ Found item [cpuDesc='No Such Object available on this agent at this OID'] index: 1
+ Found data query XML file at '/usr/share/cacti/site/resource/script_queries/cisco_cpu.xml'
+ Found data query XML file at '/usr/share/cacti/site/resource/script_queries/cisco_cpu.xml'
+ Found data query XML file at '/usr/share/cacti/site/resource/script_queries/cisco_cpu.xml'
+ Found data query XML file at '/usr/share/cacti/site/resource/script_queries/cisco_cpu.xml'
Pay attention to this line:

Code: Select all

+ Found item [cpuDesc='No Such Object available on this agent at this OID'] index: 1 
As for your new script, will it solve my problem or is it for other purposes?

And what about the black line of the last item?

Thanks,
Ziv
I can't explain why that the query results in the OID not being found, but when you do an snmpwalk, the OID is found....other than possible user error somewhere along the line. Sorry I don't have anything better than that.

As far as my new script, I can't say whether it will solve your problem or not. My script queries the same OID that's not being found in your case. But, really, you don't need my new script or even ehall's CPU script if you just have one CPU to monitor. IIRC, Cacti has a data template built in (at least in the Synaptic package) for getting the CPU utilization. It uses a older MIB that tracks only one CPU.

For the black line, it's probably something left in the graph template. Look for a line whose Graph Item Type is Line1 and Item Color is black (000000). If it's there, just delete it.
CL

zivley
Cacti User
Posts: 69
Joined: Tue Nov 13, 2007 6:22 am

#60 Post by zivley » Sun Jun 08, 2008 1:16 am

The black line was solved by deleting it, I said it in a previous post, I just wandered how it should work properly, it's not really necessary.
I like your templates better because they show all the averages values (5 min, 1 min, 5 sec), the cacti included one it's also fine, we used it till now, but this one can help us catch some bursts in CPU usage, I already saw some short high loads in a new graph using your template.

As for the OID, I've found that it has something to do with the routers versions. ehall's template had that CPU name OID problem that retrieved an error in a couple of my routers, and in a few others it retrieved a proper value. When I started using your template, some "problematic" routers started retrieving the proper value, and a couple of routers that worked ok with ehall's script, started retrieving the error with your script, go figure, I guess I could tweak the OIDs to match every single router, but it's too much work for me, so, untill I have a multi cpu router, I'll keep using your script without the CPU name inserted in the graphs.
Thanks a lot!
Ziv

Post Reply