Cacti (home)ForumsDocumentation
Cacti: offical forums and support
It is currently Sun May 26, 2019 4:13 am

All times are UTC - 5 hours




Post new topic Reply to topic  [ 9 posts ] 
Author Message
 Post subject: Trying to figure out proper polling settings for this server
PostPosted: Sun Feb 17, 2019 1:19 pm 
Offline
Cacti User

Joined: Tue Mar 31, 2009 10:05 am
Posts: 278
We are using Cacti/Spine 1.2.1 on a server with 24 cores and 140 GB of RAM.

We keep getting the occasional "POLLER: Poller[1] WARNING: There are '1' detected as overrunning a polling cycle, please investigate" and "POLLER: Poller[1] WARNING: Poller Output Table not Empty" messages. We have Boost configured and sometimes see "BOOST SVR WARNING: Detected Poller Boost Overrun, Possible Boost Poller Crash". The device in the image attached "bsianc-st-web-train" has a poll time of 0.17, current poll time of 4.23, and average of 3.42. SNMP timeout is set to 3000 ms and I guess we could increase that to 5000 ms to accommodate the occasional longer run time but it seems like doing that for all systems with issues will just cause more of a slowdown overall.

The server is currently set to use 6 processes and 32 threads, but no amount of adjusting those numbers (more processes/fewer threads, more processes/more threads) the messages keep popping up with the holes in graphs. One thing I had read said use 1-2 times the number of CPUs for process, but 24 processes and 10 threads drove the load average and CPU usage so high that the server crashed. Based on the performance settings shown in the attached, is there something there that we should change?

Attachment:
gaps.png
gaps.png [ 105.21 KiB | Viewed 133 times ]

Attachment:
load.png
load.png [ 114.66 KiB | Viewed 133 times ]

Attachment:
boost.png
boost.png [ 98.62 KiB | Viewed 133 times ]


Top
 Profile  
 
 Post subject: Re: Trying to figure out proper polling settings for this se
PostPosted: Tue Feb 26, 2019 9:01 am 
Offline
Cacti User

Joined: Tue Mar 31, 2009 10:05 am
Posts: 278
Utilization and load average keep increasing over time on this server to the point where it has to be rebooted or it locks up.

Cacti is the only thing running on this server and I would expect utilization/load to stay level since we aren't adding any new systems to monitor. Any ideas as to what I can look into to try and pinpoint the cause?

We had to reboot it last night and are going to have to do the same thing again.

We are now on Cacti 1.2.2. Server has 24 cores and 140 GB of RAM (only around 30 GB are being used) and we are currently set to use 10 processes and 15 threads with Spine.


Attachments:
util-st-01.png
util-st-01.png [ 86.33 KiB | Viewed 105 times ]
Top
 Profile  
 
 Post subject: Re: Trying to figure out proper polling settings for this se
PostPosted: Tue Feb 26, 2019 9:35 am 
Offline
Cacti User

Joined: Wed Dec 07, 2011 9:19 am
Posts: 323
What is your system stat ?
What kind of disk did you have ?
Any error on the DB ?
Most of the time the problem is accessing the database, since all go through one single table


And by the way since cacti 1.x I was never able to handle as much as I do with cacti 0.8x, I had the same trouble that you have.

_________________
CentOS
Production
Cacti 0.8.8h
Spine 0.8.8h
PIA 3.1
Aggregate 0.75
Monitor 1.3
Settings 0.71
Weathermap 0.98
Thold 0.5
rrdclean 0.41

Own plugin: LinkDiscovery 0.3, Map 0.4

Test
Cacti 1.2.1
Spine 1.2.1
thold 1.0.6
monitor 2.3.5
php 7.2.11
mariadb 5.5.56
Own plugin:
ExtendDB 1.1.2
LinkDiscovery 1.2.4
Map 1.2.5


Top
 Profile  
 
 Post subject: Re: Trying to figure out proper polling settings for this se
PostPosted: Tue Feb 26, 2019 9:37 am 
Offline
Cacti User
User avatar

Joined: Fri Feb 10, 2017 2:45 pm
Posts: 237
Location: Long Island, New York, USA
I don't know what OS you're on, but I'm running Cacti on two unique VMs with only 3 cores, 4 GB RAM and CentOS Linux each and they have been up stable forever it seems. I'm not running boost on either. Didn't see the need. I only have a couple hundred data sources and polling time is about 15 seconds for all of them.

_________________
See the Cacti 1.x templates I use at: http://www.camerabob.com/cacti

Live: Cacti 1.2.3 @ CentOS 7-6.1810.2.el7 & PHP 5.4.16-46.e17
Maint @ 1.2
Monitor @ 2.3.6
Thold @ 1.2.4

Test: Cacti 1.2.3 @ CentOS 7-6.1810.2.el7 & PHP 5.4.45-17.e17.remi
Maint @ 1.2
Monitor @ 2.3.4
Thold @ 1.0.4


Top
 Profile  
 
 Post subject: Re: Trying to figure out proper polling settings for this se
PostPosted: Tue Feb 26, 2019 9:57 am 
Offline
Cacti User

Joined: Tue Mar 31, 2009 10:05 am
Posts: 278
CentOS 6.10
We have 634 hosts and 16,749 data sources
Storage is on NFS mounted SATA disks on a FAS3220C

I am seeing a fair number of these in the MySQL log (more "reading" related ones than writing):
2019-02-25T23:50:06.336656Z 224862 [Note] Aborted connection 224862 to db: 'cacti1x' user: 'cactiuser' host: 'localhost' (Got an error reading communication packets)
2019-02-25T23:50:06.342940Z 225168 [Note] Aborted connection 225168 to db: 'cacti1x' user: 'cactiuser' host: 'localhost' (Got an error writing communication packets)

MySQL settings are below.
Attachment:
mysql.png
mysql.png [ 96.74 KiB | Viewed 101 times ]


Top
 Profile  
 
 Post subject: Re: Trying to figure out proper polling settings for this se
PostPosted: Tue Feb 26, 2019 10:06 am 
Offline
Cacti Guru User

Joined: Sun Aug 27, 2017 12:05 am
Posts: 2474
I am runnign Cacti 1.2.2 on a VM and my uptime stats are:

15:04:43 up 119 days, 4:02, 3 users, load average: 0.51, 1.00, 1.70

Looking at the graph, it's almost like you have processes which aren't terminating. I would take a look using ps -ef (maybe do that once a day to see the differences).

_________________
Official Cacti Developer

Cacti Resources:
Cacti Website (including releases)
Cacti Issues
Cacti Development Releases
Cacti Development Documentation

My resources:
How to submit Pull Requests
Development Wiki and How To's
Updated NetSNMP Memory template for Cacti 1.x
Cisco SFP template for Cacti 0.8.8


Top
 Profile  
 
 Post subject: Re: Trying to figure out proper polling settings for this se
PostPosted: Wed Feb 27, 2019 1:27 am 
Offline
Cacti User

Joined: Wed Dec 07, 2011 9:19 am
Posts: 323
I was never able to fix that kind of problem, but I had the same that you have.
I had this kind of host/rrd:
Hosts:1136 HostsPerProcess:57 DataSources:50514 RRDsProcessed:24119

I try to use Cacti as inventory, but when I add 4000 hosts (Cisco phone) that where in disable state, that screw up even more the performance, and they not suppose to be polled, but it's badly handle by cacti somehow.

And it never really work, that's why I'm still in 0.8 in production.

You can take a look on this thread:
https://github.com/Cacti/cacti/issues/1060

And there is no real guide line on how to setup a large cacti scale it's always try this or that, but not real rule: memory, memory memory and fast disk but even with that it dosen't help really.

_________________
CentOS
Production
Cacti 0.8.8h
Spine 0.8.8h
PIA 3.1
Aggregate 0.75
Monitor 1.3
Settings 0.71
Weathermap 0.98
Thold 0.5
rrdclean 0.41

Own plugin: LinkDiscovery 0.3, Map 0.4

Test
Cacti 1.2.1
Spine 1.2.1
thold 1.0.6
monitor 2.3.5
php 7.2.11
mariadb 5.5.56
Own plugin:
ExtendDB 1.1.2
LinkDiscovery 1.2.4
Map 1.2.5


Top
 Profile  
 
 Post subject: Re: Trying to figure out proper polling settings for this se
PostPosted: Wed Feb 27, 2019 7:17 pm 
Offline
Cacti User

Joined: Tue Mar 31, 2009 10:05 am
Posts: 278
Had a chance to look at the processes on this server today after the load spiked up and before it locked up. Definitely a lot of stale PHP. Not sure the best way forward since we are on the newest versions of Cacti and Spine. Related PHP info is:

PHP Version 7.2.15
PHP OS Linux
PHP uname Linux util-st-01.imsweb.com 2.6.32-754.10.1.el6.x86_64 #1 SMP Tue Jan 15 17:07:28 UTC 2019 x86_64
PHP SNMP Installed
max_execution_time 120
memory_limit 2296M

Code:
065     26431     1  0 Feb26 ?        00:00:01 /usr/local/spine_1x/spine -C /usr/local/spine_1x/spine.conf --poller 1 --first 724 --last 801 --mibs
1065     26432 26431  0 Feb26 ?        00:00:00 [php] <defunct>
1065     26447 26321  0 Feb26 ?        00:00:00 [php] <defunct>
1065     26461 26431  0 Feb26 ?        00:00:00 [php] <defunct>
1065     26467 26321  0 Feb26 ?        00:00:00 [php] <defunct>
1065     26486 26431  0 Feb26 ?        00:00:00 [php] <defunct>
1065     26488 26321  0 Feb26 ?        00:00:00 [php] <defunct>
1065     26504 26431  0 Feb26 ?        00:00:00 [php] <defunct>
1065     26524 26431  0 Feb26 ?        00:00:00 [php] <defunct>
1065     26544 26431  0 Feb26 ?        00:00:00 [php] <defunct>
1065     26564 26431  0 Feb26 ?        00:00:00 [php] <defunct>
1065     26581 26431  0 Feb26 ?        00:00:00 [php] <defunct>
1065     26598 26431  0 Feb26 ?        00:00:00 [php] <defunct>
1065     26627 26321  0 Feb26 ?        00:00:00 [php] <defunct>
1065     26890 26431  0 Feb26 ?        00:00:00 [php] <defunct>
1065     26999     1  0 Feb26 ?        00:00:03 /usr/local/spine_1x/spine -C /usr/local/spine_1x/spine.conf --poller 1 --first 682 --last 723 --mibs
1065     27000 26999  0 Feb26 ?        00:00:00 [php] <defunct>
1065     27023 26999  0 Feb26 ?        00:00:00 [php] <defunct>
1065     27067 26999  0 Feb26 ?        00:00:00 [php] <defunct>
1065     27086 26999  0 Feb26 ?        00:00:00 [php] <defunct>
1065     27110 26999  0 Feb26 ?        00:00:00 [php] <defunct>
1065     27130 26999  0 Feb26 ?        00:00:00 [php] <defunct>
1065     27152 26999  0 Feb26 ?        00:00:00 [php] <defunct>
1065     27170 26999  0 Feb26 ?        00:00:00 [php] <defunct>
1065     27191 26999  0 Feb26 ?        00:00:00 [php] <defunct>
1065     27378 26999  0 Feb26 ?        00:00:00 [php] <defunct>
1065     27512     1 92 11:53 ?        06:18:44 /usr/bin/php -q /usr/local/cacti-1.2.2/script_server.php spine 1
1065     27650     1  0 14:43 ?        00:00:03 /usr/local/spine_1x/spine -C /usr/local/spine_1x/spine.conf --poller 1 --first 682 --last 723 --mibs
1065     27651 27650  0 14:43 ?        00:00:00 [php] <defunct>
postfix  27662  3997  0 18:17 ?        00:00:00 pickup -l -t fifo -u
1065     27670 27650  0 14:43 ?        00:00:00 [php] <defunct>
1065     27695 27650  0 14:43 ?        00:00:00 [php] <defunct>
1065     27712 27650  0 14:43 ?        00:00:00 [php] <defunct>
1065     27727 27650  0 14:43 ?        00:00:00 [php] <defunct>
1065     27748 27650  0 14:43 ?        00:00:00 [php] <defunct>
1065     27764     1 87 15:29 ?        02:49:19 /usr/bin/php -q /usr/local/cacti-1.2.2/script_server.php spine 1
1065     27769 27650  0 14:43 ?        00:00:00 [php] <defunct>
1065     27787 27650  0 14:43 ?        00:00:00 [php] <defunct>
1065     27806 27650  0 14:43 ?        00:00:00 [php] <defunct>


Top
 Profile  
 
 Post subject: Re: Trying to figure out proper polling settings for this se
PostPosted: Thu Feb 28, 2019 12:24 pm 
Offline
Cacti Guru User

Joined: Sun Aug 27, 2017 12:05 am
Posts: 2474
This is strange because it's showing as defunct which means exited. But it would appear that it hasn't exited 100% properly and it's not a complete zombie because the parent process (spine) still exists.

_________________
Official Cacti Developer

Cacti Resources:
Cacti Website (including releases)
Cacti Issues
Cacti Development Releases
Cacti Development Documentation

My resources:
How to submit Pull Requests
Development Wiki and How To's
Updated NetSNMP Memory template for Cacti 1.x
Cisco SFP template for Cacti 0.8.8


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 9 posts ] 

All times are UTC - 5 hours


Who is online

Users browsing this forum: No registered users and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  

Protected by Anti-Spam ACP Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group