Some graphs disappeared

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Moderators, Developers

Post Reply
Author
Message
Muffinman
Posts: 17
Joined: Thu Apr 12, 2007 1:48 am

Some graphs disappeared

#1 Post by Muffinman » Thu Oct 02, 2008 8:40 am

Hello all,

I use cacti for a year or so. It is 0.8.7b. at 29th September I found, that all graphs from a host disappeared.

To analyse this I did the following:
- snmpwalk -c public -v 1 host
all informations went over my screen :D
- rrdtool dump /var/www/cacti/rra/host_traffic_in.rrd
there are no new entries since the 29th September :o
- compare the entries in Data Sources -> Host - Traffic and there is the right traffic_in.rrd file inserted. :-?

So my question is where to go on further for investigaion? Because the data seems to come in, but seems not to be inserted in the rrd file.

TIA

Muffinman

User avatar
TheWitness
Developer
Posts: 14834
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

#2 Post by TheWitness » Mon Oct 06, 2008 8:08 pm

Are there poller cache entries for the graphs? What happens when you repopulate the poller cache?

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of MacTrack, Boost, CLog, SpikeKill, Platform RTM, DSStats, maintainer of Spine, lot's of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Central Plugin Repository
Central Templates Repository


I'm still out there people. Getting excited for Cacti 1.2. I think it will be a great release.

Muffinman
Posts: 17
Joined: Thu Apr 12, 2007 1:48 am

#3 Post by Muffinman » Tue Oct 07, 2008 9:54 am

TheWitness wrote:Are there poller cache entries for the graphs?
Yes, here they are:

server011 - Advanced Ping Script Server: /var/www/cacti/scripts/ss_fping.php ss_fping 20 UDP 80
RRD: /var/www/cacti/rra/server011_loss_6523.rrd
server011 - CPU Utilization - CPU0 Script Server: /var/www/cacti/scripts/ss_host_cpu.php ss_host_cpu server011.mydomain.com 256 2:161:5000:public:::MD5::DES: get usage 0
RRD: /var/www/cacti/rra/server011_cpu_6527.rrd
server011 - Free Space - /dev/cciss/c0d0 Script: perl /var/www/cacti/scripts/query_unix_partitions.pl get available /dev/cciss/c0d0p1
RRD: /var/www/cacti/rra/server011_hdd_free_6531.rrd
server011 - Free Space - /dev/cciss/c0d0 Script: perl /var/www/cacti/scripts/query_unix_partitions.pl get used /dev/cciss/c0d0p1
RRD: /var/www/cacti/rra/server011_hdd_free_6531.rrd
server011 - Free Space - /dev/cciss/c0d0 Script: perl /var/www/cacti/scripts/query_unix_partitions.pl get used /dev/cciss/c0d0p2
RRD: /var/www/cacti/rra/server011_hdd_free_7073.rrd
server011 - Free Space - /dev/cciss/c0d0 Script: perl /var/www/cacti/scripts/query_unix_partitions.pl get available /dev/cciss/c0d0p2
RRD: /var/www/cacti/rra/server011_hdd_free_7073.rrd
server011 - Free Space - |query_dskDevice| Script: perl /var/www/cacti/scripts/query_unix_partitions.pl get available /dev/cciss/c0d0p3
RRD: /var/www/cacti/rra/server011_hdd_free_6532.rrd
server011 - Free Space - |query_dskDevice| Script: perl /var/www/cacti/scripts/query_unix_partitions.pl get used /dev/cciss/c0d0p3
RRD: /var/www/cacti/rra/server011_hdd_free_6532.rrd
server011 - Load Average Script: perl /var/www/cacti/scripts/loadavg_multi.pl
RRD: /var/www/cacti/rra/server011_load_1min_6524.rrd
server011 - Logged in Users Script: perl /var/www/cacti/scripts/unix_users.pl
RRD: /var/www/cacti/rra/server011_users_6525.rrd
server011 - Processes Script: perl /var/www/cacti/scripts/unix_processes.pl
RRD: /var/www/cacti/rra/server011_proc_6526.rrd
server011 - Traffic - 10.133.253.71 - eth1 SNMP Version: 2, Community: public, OID: .1.3.6.1.2.1.2.2.1.10.3
RRD: /var/www/cacti/rra/server011_traffic_in_6529.rrd
server011 - Traffic - 10.133.253.71 - eth1 SNMP Version: 2, Community: public, OID: .1.3.6.1.2.1.2.2.1.16.3
RRD: /var/www/cacti/rra/server011_traffic_in_6529.rrd
server011 - Traffic - 10.133.253.97 - tap0 SNMP Version: 2, Community: public, OID: .1.3.6.1.2.1.2.2.1.10.5
RRD: /var/www/cacti/rra/server011_traffic_in_6530.rrd
server011 - Traffic - 10.133.253.97 - tap0 SNMP Version: 2, Community: public, OID: .1.3.6.1.2.1.2.2.1.16.5
RRD: /var/www/cacti/rra/server011_traffic_in_6530.rrd
server011 - Traffic - 192.168.1.145 - eth0 SNMP Version: 2, Community: public, OID: .1.3.6.1.2.1.2.2.1.10.2
RRD: /var/www/cacti/rra/server011_traffic_in_6528.rrd
server011 - Traffic - 192.168.1.145 - eth0 SNMP Version: 2, Community: public, OID: .1.3.6.1.2.1.2.2.1.16.2
RRD: /var/www/cacti/rra/server011_traffic_in_6528.rrd
TheWitness wrote:What happens when you repopulate the poller cache?
If you mean "Rebuild Poller Cache", this has already happened. Because it helped someone in the forum before. Unfortunately it did not help me.

Thanks for replying.

User avatar
TheWitness
Developer
Posts: 14834
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

#4 Post by TheWitness » Tue Oct 07, 2008 9:54 pm

It could have broken if the time on your server moved ahead. The way we see if this is the case by running "php -q poller.php --force" from the command line and look for the "Ok" result messages.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of MacTrack, Boost, CLog, SpikeKill, Platform RTM, DSStats, maintainer of Spine, lot's of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Central Plugin Repository
Central Templates Repository


I'm still out there people. Getting excited for Cacti 1.2. I think it will be a great release.

Muffinman
Posts: 17
Joined: Thu Apr 12, 2007 1:48 am

#5 Post by Muffinman » Wed Oct 08, 2008 3:18 am

I ran the "php -q poller.php --force".

The output said something about memory which was not big enough. So I edited /etc/php.ini and changed the entry regarding memory for scrips from 8M to 32M.

Then I ran the command again and the output is like this (the graphs didn't come back):

Code: Select all

OK u:0.03 s:0.06 r:13.25
10/08/2008 09:05:14 AM - SPINE: Poller[0] Host[42] DS[732] SS[1] WARNING: Result from SERVER not valid.  Partial Result: ...
10/08/2008 09:05:14 AM - SPINE: Poller[0] Host[42] DS[733] SS[0] WARNING: Result from SERVER not valid.  Partial Result: ...
10/08/2008 09:05:14 AM - SPINE: Poller[0] Host[42] DS[734] WARNING: Result from SNMP not valid. Partial Result: ...
10/08/2008 09:05:14 AM - SPINE: Poller[0] Host[42] DS[734] WARNING: Result from SNMP not valid. Partial Result: ...
OK u:0.03 s:0.06 r:14.28
OK u:0.03 s:0.06 r:14.28

(and so on and so forth)

OK u:0.08 s:0.14 r:45.17
OK u:0.08 s:0.14 r:45.17
10/08/2008 09:05:58 AM - SPINE: Poller[0] Host[78] DS[1051] WARNING: Result from SNMP not valid. Partial Result: ...
10/08/2008 09:05:58 AM - SPINE: Poller[0] Host[78] DS[1051] WARNING: Result from SNMP not valid. Partial Result: ...
10/08/2008 09:05:58 AM - SPINE: Poller[0] Host[57] DS[911] SS[5] WARNING: Result from SERVER not valid.  Partial Result: ...
10/08/2008 09:05:58 AM - SPINE: Poller[0] Host[57] DS[912] WARNING: Result from SNMP not valid. Partial Result: ...
10/08/2008 09:05:58 AM - SPINE: Poller[0] Host[57] DS[912] WARNING: Result from SNMP not valid. Partial Result: ...
OK u:0.08 s:0.14 r:58.26
OK u:0.08 s:0.14 r:58.26

(and so on and so forth)

OK u:0.09 s:0.15 r:59.28
10/08/2008 09:09:54 AM - SPINE: Poller[0] ERROR: Spine Timed Out While Processing Hosts Internal
10/08/2008 09:09:54 AM - SYSTEM STATS: Time:294.6504 Method:spine Processes:2 Threads:25 Hosts:234 HostsPerProcess:117 DataSources:6210 RRDsProcessed:1957
I'm a bit worried about the partial results, but that is not my point. The point seems to be the time Spine needs to work with the snmpwalk results.

Here are my entries in the poller paragraph:
  • GENERAL
    Enabled is checked.
    Poller Type Spine
    Poller Interval Every 5 Minutes
    Cron Interval Every 5 Minutes
    max concurrent Poller Processes 2

    SPINE SPECIFIC EXECUTION PARAMETERS
    Max threads per process 25
    number of PHP Scritp Servers 10
    Script and Script Server Timeout Value 500
    Max SNMP OID_s per get request 10

    HOST AVAILABILITY SETTINGS
    Downed Host Detection Ping and SNMP
    Ping Type UDP Ping
    Ping Port 23
    Pint Timeout Value 5000
    Pint Retry Count 5

    HOST UP/DOWN SETTINGS
    Failure Count 2
    Recovery Count 3
Thanks! And greetings from Hamburg!

User avatar
gandalf
Developer
Posts: 22375
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

#6 Post by gandalf » Wed Oct 08, 2008 1:55 pm

I'd like to ask you o follow the steps of 2nd link of my sig. There's a logical sequence in it. Please report findings of EACH AND EVERY step
Reinhard

Muffinman
Posts: 17
Joined: Thu Apr 12, 2007 1:48 am

#7 Post by Muffinman » Fri Oct 10, 2008 8:59 am

Hello,

here I will document what I did (each and every step):
  • 1. Check Cacti Log File
    Yes, there are SNMP timeouts detected. BUT: They come not always from the same host, they come never from the host, I have now troubles with, and they come not regularly. (I assume that they come, when this very host is too busy to answer.)

    For example:

    Code: Select all

    grep "WARNING: SNMP timeout detected" /var/www/cacti/log/cacti.log.old
    10/05/2008 02:36:18 AM - SPINE: Poller[0] Host[6] DS[59] WARNING: SNMP timeout detected [500 ms], ignoring host 'host1.mydomain'
    10/05/2008 03:55:17 AM - SPINE: Poller[0] Host[62] DS[945] WARNING: SNMP timeout detected [500 ms], ignoring host 'host2.mydomain'
    10/05/2008 03:55:17 AM - SPINE: Poller[0] Host[62] DS[945] WARNING: SNMP timeout detected [500 ms], ignoring host 'host2.mydomain'
    10/05/2008 03:55:46 AM - SPINE: Poller[0] Host[109] DS[1507] WARNING: SNMP timeout detected [5000 ms], ignoring host 'host3.mydomain'
    
    2. Check Basic Data Gathering
    I have no special or own scripts. So I tested only with snmpwalk and snmpget and did get correct answers.

    For example:

    Code: Select all

    snmpget -c community-string -v2c trouble-making-host.mydomain .1.3.6.1.2.1.2.2.1.16.5
    IF-MIB::ifOutOctets.5 = Counter32: 945930899
    
    snmpwalk -c community-string -v2c
    SNMPv2-MIB::sysDescr.0 = STRING: Linux trouble-making-host 2.4.34-grsec #1 SMP Wed Feb 28 20:51:52 EST 2007 i686
    SNMPv2-MIB::sysObjectID.0 = OID: NET-SNMP-MIB::netSnmpAgentOIDs.10
    DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (905373152) 104 days, 18:55:31.52
    ...
    
    3. Check cacti's poller
    I think that here is the crucial point.
    I did the

    Code: Select all

    /usr/bin/spine --verbosity=5 256 256
    
    (because 256 is the ID of the host I look at). The output is fine (see below)! And the best thing: I can create correct output for the graphs (= when I run spine by hand for my trouble host, I do create strokes for the time in the graphs [and of course in the rrd]).
    I attached the debug output from spine to this posting.

    4. Check MySQL updating
    I skipped this part, as you wrote and did the thing with rrd file
    updating.

    5. Check rrd file updating
    Another interesting point is, that there are "rrdtool update --template"
    lines in the debug logfile. But none of those lines contain the
    trouble-making-host.mydomain. ;-(

    6. Check rrd file ownership
    The ownership of all rrd files is the same and is everywhere cacti for
    user and cacti for group and the cacti user is the one I used for the tests of
    spine, snmpwalk and so on.

    7. Check rrd file numbers
    I did check the numers in the rrd file and they are like this:

    Code: Select all

    ds[cpu].type = "GAUGE"
    ds[cpu].minimal_heartbeat = 600
    ds[cpu].min = 0.0000000000e+00
    ds[cpu].max = 1.0000000000e+02
    ds[cpu].last_ds = "5"
    ds[cpu].value = NaN
    rra[0].cf = "AVERAGE"
    rra[0].rows = 600
    rra[0].pdp_per_row = 1
    rra[0].xff = 5.0000000000e-01
    rra[0].cdp_prep[0].value = NaN
    rra[0].cdp_prep[0].unknown_datapoints = 0
    rra[1].cf = "AVERAGE"
    rra[1].rows = 700
    rra[1].pdp_per_row = 6
    rra[1].xff = 5.0000000000e-01
    rra[1].cdp_prep[0].value = NaN
    rra[1].cdp_prep[0].unknown_datapoints = 3
    
    So no problem with minimum and maximum, eh?

    8. Check rrdtool graph statement
    No problem with creating graphs because, it displays just what is in the
    rrd file: NaN.
The other points I will skip. I think I pointed out where the (my) error is.

Thank you for your help and have a nice weekend!

Best greetings from Hamburg.
Attachments
cacti-debug-output.txt
The cacti debug output is attached here...
(19.72 KiB) Downloaded 74 times

User avatar
gandalf
Developer
Posts: 22375
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

#8 Post by gandalf » Sat Oct 11, 2008 1:58 am

Thanks for posting this. Indeed, polling seems to be an issue. But from spine output, I see the following

- you seem to use scripts dedicated to localhost only for polling Host[256]. Is this the cacti localhost? If not, please use ucd/net templates instead
- many data is returned, but some not. Please again indicate which one is troiubling you (perhaps I did skip your statement concerning this)

Reinhard

Muffinman
Posts: 17
Joined: Thu Apr 12, 2007 1:48 am

#9 Post by Muffinman » Mon Oct 13, 2008 2:27 am

Good morning!
gandalf wrote: - you seem to use scripts dedicated to localhost only for polling Host[256]. Is this the cacti localhost?
No, the Host[256] is not localhost. This is the troublemaking host. Localhost has ID 1.
If not, please use ucd/net templates instead.
This I do not understand completely: I shall exchange some templates I use by ucd/net templates? But for some topics there are no ucd/net equivalents. ;-(
many data is returned, but some not. Please again indicate which one is troiubling you (perhaps I did skip your statement concerning this)
The poller log I showed (cacti-debug-output.txt in my last posting), was only from the troublemaking client, because I did "/usr/bin/spine --verbosity=5 256
256". Why some data is not returned is my big question.

Thanks for listening!

Greetings from Hamburg

User avatar
gandalf
Developer
Posts: 22375
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

#10 Post by gandalf » Mon Oct 13, 2008 10:22 am

Muffinman wrote:This I do not understand completely: I shall exchange some templates I use by ucd/net templates? But for some topics there are no ucd/net equivalents. ;-(
Those "Localhost" Templates fetch the data from the local host, not Host[256]. You will not want this. For many functions, there are ucd/net replacements. If not everything is covered, please search Scripts and Templates Forum for replacements.
Reinhard

Post Reply