CPU full

Templates, scripts for templates, scripts and requests for templates.

Moderators: Moderators, Developers

Author
Message
JiiPee
Posts: 10
Joined: Sun Oct 02, 2005 9:33 pm
Location: FINLAND
Contact:

CPU full

#1 Post by JiiPee » Wed Jun 04, 2008 3:35 pm

Hello,

I have been trying out different CPU templates for linux and none seems to work like they should. So I did create my own what also is not working like it should. :D

As you can see from picture, CPU total usage is around 440% what is something 10% too much (I have 4 core CPU in that box.)

Has anyone idea why it show too much? Can it be that SNMP queries os CPU stats is not made same time so there will happen changes related example idle <-> wait stats?

And if that is possible cause, is there any way to get that stats queried almost same time?

Oh and if someone likes this template, I can share it if I know how... Quite n00b to exporting templates.
Attachments
cpu-full.png
cpu-full.png (75.52 KiB) Viewed 9289 times

dus001
Posts: 42
Joined: Sun Aug 07, 2005 6:07 am

#2 Post by dus001 » Wed Jun 04, 2008 6:07 pm

Hello,
Can it be that SNMP queries os CPU stats is not made same time so there will happen changes related example idle <-> wait stats?
Don't you rather think you're counting some amount of the cpu usage twice ?
Maybe you should check the definition of each used data source. I can not be more specific given the provided elements.

JiiPee
Posts: 10
Joined: Sun Oct 02, 2005 9:33 pm
Location: FINLAND
Contact:

#3 Post by JiiPee » Wed Jun 04, 2008 6:10 pm

I don't think so because values are quite different all.

User avatar
gandalf
Developer
Posts: 22375
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

#4 Post by gandalf » Thu Jun 05, 2008 1:47 pm

One of those items (Interrupt?) is already part of another one. I've hit this some months ago but can't remember which one it was
Reinhard

JiiPee
Posts: 10
Joined: Sun Oct 02, 2005 9:33 pm
Location: FINLAND
Contact:

#5 Post by JiiPee » Thu Jun 05, 2008 3:35 pm

gandalf wrote:One of those items (Interrupt?) is already part of another one. I've hit this some months ago but can't remember which one it was
Reinhard
Okey, so it should be removed. Maybe there is some other value too what is already part of something because interrupt usage is very small..

Code: Select all

  id  hash  name    
      4 cdfed2d401723d2f41fc239d4ce249c7 ucd/net - CPU System       
      5 a27e816377d2ac6434a87c494559c726 ucd/net - CPU User       
      6 c06c3d20eccb9598939dc597701ff574 ucd/net - CPU Nice       
      101 e3bda2b67dee47dac2a35610d5f22344 ucd/net - CPU Soft IRQ       
      97 616fe09f9ada7dba1d9cdd778a76827f ucd/net - CPU Idle       
      98 01a50e5bc738d4cd28e7b4c489110046 ucd/net - CPU Wait       
      99 d57ad833b89965f8a6517bd99f8e4507 ucd/net - CPU Kernel       
      100 2adbc18456bb75a57e820003877c3727 ucd/net - CPU Interrupt 


Thats what I have in data_template

If I want to change id 4-6 to 102-104 where else I need to change id?

I'm still thinking that it might be related to queries what is not made same time.

Because now that system has been idle for a while, I see total value little under 400 (average 382, max 476)

dus001
Posts: 42
Joined: Sun Aug 07, 2005 6:07 am

#6 Post by dus001 » Thu Jun 05, 2008 4:10 pm

I'm still thinking that it might be related to queries what is not made same time.
It would be very strange that only your template would be affected by such a problem... No ?
Thats what I have in data_template
That's not really helpful to understand where the values come from.

maybe you could check this:
http://www.net-snmp.org/docs/mibs/ucdavis.html

about ssCpuRawSystem, you can read:
"This object may sometimes be implemented as the
combination of the 'ssCpuRawWait(54)' and
'ssCpuRawKernel(55)' counters, so care must be
taken when summing the overall raw counters."

What does the standard cpu graph which comes with cacti look like for this server ? What is the returned value for the total cpu ?

JiiPee
Posts: 10
Joined: Sun Oct 02, 2005 9:33 pm
Location: FINLAND
Contact:

#7 Post by JiiPee » Thu Jun 12, 2008 4:04 pm

dus001 wrote:
I'm still thinking that it might be related to queries what is not made same time.
It would be very strange that only your template would be affected by such a problem... No ?
Nope. Normal template ONLY track nice, user and system.
Thats what I have in data_template
That's not really helpful to understand where the values come from.
It was ment to show that poller might have some delay after it gets values from all other than nice, user and system because id's are far away from.

about ssCpuRawSystem, you can read:
"This object may sometimes be implemented as the
combination of the 'ssCpuRawWait(54)' and
'ssCpuRawKernel(55)' counters, so care must be
taken when summing the overall raw counters."
This don't explain why cpu usage is 470% when box is under heavy load and under 370% when box is mostly idle.

1 CPU = 100%
1 CPU with 4 cores = 400% right?
What does the standard cpu graph which comes with cacti look like for this server ? What is the returned value for the total cpu ?
They was foobar. Didn't show nice value at all if there was lot of programs running under nice.

JiiPee
Posts: 10
Joined: Sun Oct 02, 2005 9:33 pm
Location: FINLAND
Contact:

#8 Post by JiiPee » Mon Jul 07, 2008 5:40 am

Is it possible to create CDEF function what will remove wat and kernel from system?

If it's possible, how to do it?

User avatar
gandalf
Developer
Posts: 22375
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

#9 Post by gandalf » Mon Jul 07, 2008 5:53 am

Yes, it is.
Please visit Graph Management and switch to debug. Then, you'll find the whole rrdtool graph statement and a CDEF in it. This one adds up all data to form the total. You now may change this CDEF to get what you've expected
Reinhard

JiiPee
Posts: 10
Joined: Sun Oct 02, 2005 9:33 pm
Location: FINLAND
Contact:

#10 Post by JiiPee » Mon Jul 07, 2008 6:48 am

gandalf wrote:Yes, it is.
Please visit Graph Management and switch to debug. Then, you'll find the whole rrdtool graph statement and a CDEF in it. This one adds up all data to form the total. You now may change this CDEF to get what you've expected
Reinhard
Thanks, I'll take alook of it.

It seems like wait is already in system counters, but kernel is not, so if I like to get separate stats from system and wait, then wait have to be removed from system.

I'll try if I can do that.

EDIT: Sorry kernel is already in system counters

User avatar
gandalf
Developer
Posts: 22375
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

#11 Post by gandalf » Tue Jul 08, 2008 2:10 am

The "correct way" of doing things then would be to replace "system" by "system - kernel" and printing "kernel" again as is. The total then would show "system" only and would NOT add "kernel" to the total.
Reinhard

JiiPee
Posts: 10
Joined: Sun Oct 02, 2005 9:33 pm
Location: FINLAND
Contact:

#12 Post by JiiPee » Tue Jul 08, 2008 2:52 pm

gandalf wrote:The "correct way" of doing things then would be to replace "system" by "system - kernel" and printing "kernel" again as is. The total then would show "system" only and would NOT add "kernel" to the total.
Reinhard
hmm jeah. But how I do that? I don't have much experience of messing out with cacti. I know how to do it if I use external script, but why should I use script when all values are available directly via snmp.

Or should I just need to make script to do it?

User avatar
gandalf
Developer
Posts: 22375
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

#13 Post by gandalf » Tue Jul 08, 2008 3:52 pm

No, That's CDEF only.
Visit Graph Managemet, select this graph and switch to DEBUG to see the only CDEF applied to that graph currently. But from that and some CDEF theory (see rrdtool man pages), this should be doable
Reinhard

JiiPee
Posts: 10
Joined: Sun Oct 02, 2005 9:33 pm
Location: FINLAND
Contact:

#14 Post by JiiPee » Fri Jul 11, 2008 4:36 pm

Ok I think I managed to create CDEF to remove kernel usage from system, but total seems to calculate wrong, so what I'm doing wrong?


RRDTool Command:

/usr/bin/rrdtool graph - \
--imgformat=PNG \
--start=-86400 \
--end=-300 \
--title="rep5 - misc - CPU Usage" \
--rigid \
--base=1000 \
--height=120 \
--width=500 \
--alt-autoscale-max \
--lower-limit=0 \
--vertical-label="percent" \
--slope-mode \
--font TITLE:10: \
--font AXIS:8: \
--font LEGEND:8: \
--font UNIT:8: \
DEF:a="/var/lib/cacti/rra/rep5_-_misc_cpu_system_256.rrd":cpu_system:AVERAGE \
DEF:b="/var/lib/cacti/rra/rep5_-_misc_cpu_system_256.rrd":cpu_system:LAST \
DEF:c="/var/lib/cacti/rra/rep5_-_misc_cpu_system_256.rrd":cpu_system:MAX \
DEF:d="/var/lib/cacti/rra/rep5_-_misc_cpu_kernel_253.rrd":cpu_kernel:AVERAGE \
DEF:e="/var/lib/cacti/rra/rep5_-_misc_cpu_kernel_253.rrd":cpu_kernel:LAST \
DEF:f="/var/lib/cacti/rra/rep5_-_misc_cpu_kernel_253.rrd":cpu_kernel:MAX \
DEF:g="/var/lib/cacti/rra/rep5_-_misc_cpu_wait_258.rrd":cpu_wait:AVERAGE \
DEF:h="/var/lib/cacti/rra/rep5_-_misc_cpu_wait_258.rrd":cpu_wait:LAST \
DEF:i="/var/lib/cacti/rra/rep5_-_misc_cpu_wait_258.rrd":cpu_wait:MAX \
DEF:j="/var/lib/cacti/rra/rep5_-_misc_cpu_user_257.rrd":cpu_user:AVERAGE \
DEF:ba="/var/lib/cacti/rra/rep5_-_misc_cpu_user_260.rrd":cpu_user:LAST \
DEF:bb="/var/lib/cacti/rra/rep5_-_misc_cpu_user_260.rrd":cpu_user:MAX \
DEF:bc="/var/lib/cacti/rra/rep5_-_misc_cpu_nice_254.rrd":cpu_nice:AVERAGE \
DEF:bd="/var/lib/cacti/rra/rep5_-_misc_cpu_nice_259.rrd":cpu_nice:LAST \
DEF:be="/var/lib/cacti/rra/rep5_-_misc_cpu_nice_259.rrd":cpu_nice:MAX \
DEF:bf="/var/lib/cacti/rra/rep5_-_misc_cpu_interrupt_252.rrd":cpu_interrupt:AVERAGE \
DEF:bg="/var/lib/cacti/rra/rep5_-_misc_cpu_interrupt_252.rrd":cpu_interrupt:LAST \
DEF:bh="/var/lib/cacti/rra/rep5_-_misc_cpu_interrupt_252.rrd":cpu_interrupt:MAX \
DEF:bi="/var/lib/cacti/rra/rep5_-_misc_cpu_softirq_255.rrd":cpu_softirq:AVERAGE \
DEF:bj="/var/lib/cacti/rra/rep5_-_misc_cpu_softirq_255.rrd":cpu_softirq:LAST \
DEF:ca="/var/lib/cacti/rra/rep5_-_misc_cpu_softirq_255.rrd":cpu_softirq:MAX \
DEF:cb="/var/lib/cacti/rra/rep5_-_misc_cpu_idle_251.rrd":cpu_idle:AVERAGE \
DEF:cc="/var/lib/cacti/rra/rep5_-_misc_cpu_idle_251.rrd":cpu_idle:LAST \
DEF:cd="/var/lib/cacti/rra/rep5_-_misc_cpu_idle_251.rrd":cpu_idle:MAX \
CDEF:cdefa=a,d,- \
CDEF:cdefb=a,d,- \
CDEF:cdefd=a,d,- \
CDEF:cdefdc=TIME,1215811495,GT,a,a,UN,0,a,IF,IF,TIME,1215811495,GT,d,d,UN,0,d,IF,IF,TIME,1215811495,GT,g,g,UN,0,g,IF,IF,TIME,1215811495,GT,
j,j,UN,0,j,IF,IF,TIME,1215811495,GT,bc,bc,UN,0,bc,IF,IF,TIME,1215811495,GT,bf,bf,UN,0,bf,IF,IF,TIME,1215811495,GT,bi,bi,UN,0,bi,IF,IF,TIME,
1215811495,GT,cb,cb,UN,0,cb,IF,IF,+,+,+,+,+,+,+ \
AREA:cdefa#FF0000FF:"System" \
GPRINT:cdefb:LAST:" Current\:%8.2lf %s" \
GPRINT:cdefa:AVERAGE:"Average\:%8.2lf %s" \
GPRINT:cdefd:MAX:"Maximum\:%8.2lf %s\n" \
AREA:d#942D0CFF:"Kernel":STACK \
GPRINT:e:LAST:" Current\:%8.2lf %s" \
GPRINT:d:AVERAGE:"Average\:%8.2lf %s" \
GPRINT:f:MAX:"Maximum\:%8.2lf %s\n" \
AREA:g#FF00FFFF:"Wait":STACK \
GPRINT:h:LAST:" Current\:%8.2lf %s" \
GPRINT:g:AVERAGE:"Average\:%8.2lf %s" \
GPRINT:i:MAX:"Maximum\:%8.2lf %s\n" \
AREA:j#0000FFFF:"User":STACK \
GPRINT:ba:LAST:" Current\:%8.2lf %s" \
GPRINT:bb:AVERAGE:"Average\:%8.2lf %s" \
GPRINT:bb:MAX:"Maximum\:%8.2lf %s\n" \
AREA:bc#00FF00FF:"Nice":STACK \
GPRINT:bd:LAST:" Current\:%8.2lf %s" \
GPRINT:be:AVERAGE:"Average\:%8.2lf %s" \
GPRINT:be:MAX:"Maximum\:%8.2lf %s\n" \
AREA:bf#FFAB00FF:"Interrupt":STACK \
GPRINT:bg:LAST:"Current\:%8.2lf %s" \
GPRINT:bf:AVERAGE:"Average\:%8.2lf %s" \
GPRINT:bh:MAX:"Maximum\:%8.2lf %s\n" \
AREA:bi#157419FF:"SoftIRQ":STACK \
GPRINT:bj:LAST:" Current\:%8.2lf %s" \
GPRINT:bi:AVERAGE:"Average\:%8.2lf %s" \
GPRINT:ca:MAX:"Maximum\:%8.2lf %s\n" \
AREA:cb#FFFF00FF:"Idle":STACK \
GPRINT:cc:LAST:" Current\:%8.2lf %s" \
GPRINT:cb:AVERAGE:"Average\:%8.2lf %s" \
GPRINT:cd:MAX:"Maximum\:%8.2lf %s\n" \
LINE1:cdefdc#000000FF:"Total" \
GPRINT:cdefdc:LAST:" Current\:%8.2lf %s" \
GPRINT:cdefdc:AVERAGE:"Average\:%8.2lf %s" \
GPRINT:cdefdc:MAX:"Maximum\:%8.2lf %s"
RRDTool Says:

OK
Attachments
8cpu.png
8cpu.png (36.09 KiB) Viewed 8197 times

zlyZwierz
Posts: 1
Joined: Wed Jul 16, 2008 6:49 am

#15 Post by zlyZwierz » Wed Jul 16, 2008 6:59 am

Hello,

I had similar problem recently, after some portion of reading ( http://www.opennms.org/index.php/Net-sn ... ollections ) i succesfully created CPU graph (2 CPU's).
Attachments
cacti_graph_template_extended_cpu_usage.xml
Graph template.
(52.26 KiB) Downloaded 582 times
ext-cpu.png
The graph.
ext-cpu.png (49.25 KiB) Viewed 8029 times

Post Reply