|
|
| Author |
Message |
jcotton
Joined: 27 Jun 2005 Posts: 25
|
Posted: Wed Jul 13, 2005 11:40 am Post subject: polling performance problem |
|
|
First things first:
RedHat 9
cacti-0.8.6e
cactid (compiled from source)
PHP 4.2.2
mySQL server 3.23.58
I arrived at work on Tuesday to find my cacti system gasping for air and out of resources. All of a sudden, a polling cycle that used to take approx 100 seconds, was consistently timing out. So I compiled cactid and started using it late yesterday afternoon. The polling cycle was reduced to 40 seconds, with no errors. I arrived at work this morning to once again find the system completely out of resources, with 20+ rrdtool processes running (I could not post it because the system was not usable, and had to be switched off). An excerpt from the Cacti log file has been attached.
I will state that this is all running on a P3 ThinkPad with 256mb or RAM.
Any help would be much appreciated.
Justin
| Description: |
|
 Download |
| Filename: |
cacti.log1.txt |
| Filesize: |
28.53 KB |
| Downloaded: |
432 Time(s) |
|
|
| Back to top |
|
 |
TheWitness Developer
Joined: 14 May 2002 Posts: 9736 Location: MI, USA
|
Posted: Wed Jul 13, 2005 1:26 pm Post subject: |
|
|
JCotton,
In my other note, I did not ask if you were running rrdtool 1.2.x. Is this the case?
TheWitness
|
|
| Back to top |
|
 |
williem Cacti User
Joined: 08 Feb 2005 Posts: 59
|
Posted: Thu Jul 14, 2005 7:38 am Post subject: Cactid |
|
|
Larry,
I have noticed that I am getting timeout errors with the production version of cactid. I did not get these in the last RC. also, my time went from 100 sec to 150sec between the last RC and the production version of cactid-0.8.6e
Regards,
Willie
|
|
| Back to top |
|
 |
TheWitness Developer
Joined: 14 May 2002 Posts: 9736 Location: MI, USA
|
Posted: Thu Jul 14, 2005 11:10 am Post subject: |
|
|
Willie,
What was your last RC?
Larry
|
|
| Back to top |
|
 |
TheWitness Developer
Joined: 14 May 2002 Posts: 9736 Location: MI, USA
|
Posted: Thu Jul 14, 2005 11:17 am Post subject: |
|
|
Willie,
Also, when you state "timeout" what are you referring to? SNMP, SCRIPT, or MAX Runtime??
Larry
|
|
| Back to top |
|
 |
williem Cacti User
Joined: 08 Feb 2005 Posts: 59
|
Posted: Fri Jul 15, 2005 8:28 am Post subject: cactid |
|
|
Larry,
An example of the logs is at the end of the message. I was out most of yesterday. sorry for the delay in answering your questions. My last RC before things went live was the one you sent on 6/13/05. It wasnt actually labeled as RC.
Regards,
Willie
07/15/2005 08:22:08 AM - SYSTEM STATS: Time: 128.3748 s, Method: cactid, Processes: 4, Threads: 20, Hosts: 2067, Hosts/Process: 517, Data Sources 15047, RRDs Processed 0
07/15/2005 08:20:00 AM - CACTID: Poller[0] ERROR: Cactid Timed Out While Processing Hosts External
07/15/2005 08:20:00 AM - CACTID: Poller[0] ERROR: Cactid Timed Out While Processing Hosts External
07/15/2005 08:20:00 AM - CACTID: Poller[0] ERROR: Cactid Timed Out While Processing Hosts External
07/15/2005 08:17:07 AM - SYSTEM STATS: Time: 127.3601 s, Method: cactid, Processes: 4, Threads: 20, Hosts: 2067, Hosts/Process: 517, Data Sources 15047, RRDs Processed 0
07/15/2005 08:15:00 AM - CACTID: Poller[0] ERROR: Cactid Timed Out While Processing Hosts External
07/15/2005 08:15:00 AM - CACTID: Poller[0] ERROR: Cactid Timed Out While Processing Hosts External
07/15/2005 08:15:00 AM - CACTID: Poller[0] ERROR: Cactid Timed Out While Processing Hosts External
07/15/2005 08:12:07 AM - SYSTEM STATS: Time: 127.3503 s, Method: cactid, Processes: 4, Threads: 20, Hosts: 2067, Hosts/Process: 517, Data Sources 15047, RRDs Processed 0
|
|
| Back to top |
|
 |
TheWitness Developer
Joined: 14 May 2002 Posts: 9736 Location: MI, USA
|
Posted: Fri Jul 15, 2005 9:59 am Post subject: |
|
|
That looks more like a bug in that your hosts are polling and completing just fine... Will have to look at it. Have you got the latest cygwin and remade cacti from it? I hate those things...
Larry
|
|
| Back to top |
|
 |
williem Cacti User
Joined: 08 Feb 2005 Posts: 59
|
Posted: Wed Jul 27, 2005 8:59 am Post subject: Cactid |
|
|
Larry,
I am going to move the cacti box from my secondary switch to the primary switch to see if that will clear the problem. I am still seeing it reguarly in my error log. I do have the latest cygwin installed.
Regards,
Willie
|
|
| Back to top |
|
 |
williem Cacti User
Joined: 08 Feb 2005 Posts: 59
|
Posted: Thu Jul 28, 2005 7:39 am Post subject: cactid |
|
|
Larry,
Here is another log extract. what I noticed on this on that I didnt notice before is that I am always getting a maxtimeout right before I get the cactid timed out processing internal hosts. It will correct itself but it takes several hours. Any idea of where to look?
Regards,
Willie
07/28/2005 01:25:09 AM - CACTID: Poller[0] Host[9] ERROR: HOST EVENT: Host is DOWN Message: Host did not respond to SNMP
07/28/2005 01:25:09 AM - CACTID: Poller[0] Host[5] ERROR: HOST EVENT: Host is DOWN Message: Host did not respond to SNMP
07/28/2005 01:25:09 AM - CACTID: Poller[0] Host[4] ERROR: HOST EVENT: Host is DOWN Message: Host did not respond to SNMP
07/28/2005 01:25:08 AM - CACTID: Poller[0] Host[3] ERROR: HOST EVENT: Host is DOWN Message: Host did not respond to SNMP
07/28/2005 01:25:02 AM - CACTID: Poller[0] ERROR: Cactid Timed Out While Processing Hosts Internal
07/28/2005 01:25:02 AM - CACTID: Poller[0] ERROR: Cactid Timed Out While Processing Hosts Internal
07/28/2005 01:25:02 AM - CACTID: Poller[0] ERROR: Cactid Timed Out While Processing Hosts Internal
07/28/2005 01:24:57 AM - POLLER: Poller[0] Maximum runtime of 296 seconds exceeded. Exiting.
07/28/2005 01:24:48 AM - CACTID: Poller[0] Host[1506] ERROR: HOST EVENT: Host is DOWN Message: Host did not respond to SNMP
07/28/2005 01:24:47 AM - CACTID: Poller[0] Host[1505] ERROR: HOST EVENT: Host is DOWN Message: Host did not respond to SNMP
07/28/2005 01:17:43 AM - SYSTEM STATS: Time: 163.0167 s, Method: cactid, Processes: 4, Threads: 20, Hosts: 2066, Hosts/Process: 517, Data Sources 15173, RRDs Processed 6769
|
|
| Back to top |
|
 |
TheWitness Developer
Joined: 14 May 2002 Posts: 9736 Location: MI, USA
|
Posted: Fri Aug 19, 2005 7:21 pm Post subject: |
|
|
Willie,
I have found a few bugs in 0.8.6e. I will be releasing 0.8.6f shortly. You can pull a pretty complete copy of it from SVN right now.
I only have left to add Bulk get functions for snmpv2 and snmpv3 hosts. Yes, snmpv3!
The use of Bulk functions will show a marked increase in performance once I figure out how I am going to do it. It's a days work.
In addition, we have pulled the recache activities from the poller in Cacti 0.8.6g, which is also due out shortly (we still have a few bug reports to clean up). The recache events are still called from the poller, but run independently of it. You will have to change your scheduled task ID to SYSTEM though since the poller will end while recaching is still taking place. Without running as a SYSTEM task, there will be registry unload problems in Windows.
Larry
|
|
| Back to top |
|
 |
TheWitness Developer
Joined: 14 May 2002 Posts: 9736 Location: MI, USA
|
Posted: Mon Aug 22, 2005 10:01 am Post subject: |
|
|
Anyone interested, please see the following link. I have in production now and 2 things have happened:
Polling times have gone down
System Load has gone down
No more random segfaults
TheWitness
http://forums.cacti.net/viewtopic.php?t=8974
|
|
| Back to top |
|
 |
chadd Cacti User
Joined: 24 Mar 2005 Posts: 194 Location: Ocoee, Florida
|
Posted: Thu Aug 25, 2005 8:12 am Post subject: Cactid problems completing poll |
|
|
The following out put is all I get after cactid runs for a while:
OK u:0.09 s:0.53 r:11.40
08/25/2005 09:02:08 AM - POLLER: Poller[0] Maximum runtime of 296 seconds exceeded. Exiting.
It seems to hang at r:11.40, but if I run the cmp.php poller, it runs all the way through to about 256 or so. I think I have about 106 interfaces to poll at the moment, and that is about 1/3 or less of the interfaces I have to put into cacti. I really hope to get cactid working again, as my processor load has gone from 0-1 using cactid to ~6+ using cmd.php.
I am using the plugin arch by cigamit with the thold plugin, but I doubt that is the problem.
Any ideas?
|
|
| Back to top |
|
 |
chadd Cacti User
Joined: 24 Mar 2005 Posts: 194 Location: Ocoee, Florida
|
Posted: Thu Aug 25, 2005 8:14 am Post subject: follow up to previous post |
|
|
| Sorry, I forgot to include the cactid rev. I am currently using CACTID 0.8.6e, but have tried the latest 'f' version you sent as well. It had the same results. Thanks for any help you can give.
|
|
| Back to top |
|
 |
TheWitness Developer
Joined: 14 May 2002 Posts: 9736 Location: MI, USA
|
Posted: Thu Aug 25, 2005 8:56 am Post subject: |
|
|
Chadd,
Please state your RC of Cactid 0.8.6f? Also, what is your mix of data sources:
#scripts (action = 1)
#snmp (action = 0)
#script server (action = 2)
You can obtain the Action information from your poller cache. Just copy and paste into Excel or extract directly from the poller_items database in Cacti.
Thanks,
Larry
|
|
| Back to top |
|
 |
TheWitness Developer
Joined: 14 May 2002 Posts: 9736 Location: MI, USA
|
Posted: Thu Aug 25, 2005 8:57 am Post subject: |
|
|
Please note, that there is also an issue with Cacti attempting to perform re-indexing on no longer applicable hosts. Review your poller_reindex table for invalid host_id's.
TheWitness
|
|
| Back to top |
|
 |
|