Cacti (home)ForumsDocumentation
Cacti: offical forums and support
It is currently Sat Feb 23, 2019 12:05 pm

All times are UTC - 5 hours




Post new topic Reply to topic  [ 6 posts ] 
Author Message
 Post subject: Cacti: Spine 1.1.38 unable to handle large disabled device
PostPosted: Mon Jun 25, 2018 5:54 am 
Offline
Cacti User

Joined: Wed Dec 07, 2011 9:19 am
Posts: 312
I have a problem with my install. cacti config or mariadb wrong config !??!
I have 997 enable devices, that mean resources: 8618 RRDsProcessed:4796

it's working almost correctly.

but I'm also using Cacti to manage some endpoint, like ipPhone, I don't do anything on them, there are in disable state.
So far I have 5113 disabled device.

But in that config I have plenty of error from spine like that:
2018/06/25 11:17:36 - SYSTEM THOLD STATS: CPUTime:0 MaxRuntime:0 Tholds:0 TotalDevices:997 DownDevices:4 NewDownDevices:0 Processes: 0 completed, 0 running, 0 broken
2018/06/25 11:17:02 - SPINE: Poller[Main Poller] ERROR: Spine Timed Out While Waiting for Threads to End
2018/06/25 11:17:01 - SPINE: Poller[Main Poller] ERROR: Spine Timed Out While Waiting for Threads to End
2018/06/25 11:17:01 - POLLER: Poller[Main Poller] WARNING: There are '2' detected as overrunning a polling process, please investigate
2018/06/25 11:17:00 - SPINE: Poller[Main Poller] ERROR: Spine Timed Out While Processing Devices Internal
2018/06/25 11:17:00 - SPINE: Poller[Main Poller] ERROR: Spine Timed Out While Processing Devices Internal
2018/06/25 11:17:00 - SYSTEM STATS: Time:59.0163 Method:spine Processes:3 Threads:5 Hosts:997 HostsPerProcess:333 DataSources:8781 RRDsProcessed:3902
2018/06/25 11:17:00 - POLLER: Poller[Main Poller] Maximum runtime of 58 seconds exceeded. Exiting.
2018/06/25 11:16:03 - SPINE: Poller[Main Poller] ERROR: Spine Timed Out While Waiting for Threads to End
2018/06/25 11:16:03 - SPINE: Poller[Main Poller] ERROR: Spine Timed Out While Waiting for Threads to End
2018/06/25 11:16:02 - SPINE: Poller[Main Poller] ERROR: Spine Timed Out While Processing Devices Internal
2018/06/25 11:16:01 - POLLER: Poller[Main Poller] WARNING: Poller Output Table not Empty. Issues: 9, Graphs[se-ssi-13 - Traffic - Gi1/0/28 - (128 SE-SSI-14 / 2960S-24P / Int G1/0/25 ---), se-ssi-13 - Traffic - Gi1/0/28 - (128 SE-SSI-14 / 2960S-24P / Int G1/0/25 ---)] Graphs[se-ssi-13 Gi1/0/26 - Status - 126 SE-SSI-16 / 3560-24P / Int G0/1 --- , se-ssi-13 Gi1/0/26 - Status - 126 SE-SSI-16 / 3560-24P / Int G0/1 --- ] Graphs[se-ssi-13 - Traffic - Gi1/0/26 - (126 SE-SSI-16 / 3560-24P / Int G0/1 ---), se-ssi-13 - Traffic - Gi1/0/26 - (126 SE-SSI-16 / 3560-24P / Int G0/1 ---)] Graphs[se-ssi-13 - Traffic - Gi1/0/28 - (128 SE-SSI-14 / 2960S-24P / Int G1/0/25 ---), se-ssi-13 - Traffic - Gi1/0/28 - (128 SE-SSI-14 / 2960S-24P / Int G1/0/25 ---)] Graphs[se-ssi-13 - Traffic - Gi1/0/25 - (125 SE-SSI-12 / 2960S-24P / Int G1/0/28 ---), se-ssi-13 - Traffic - Gi1/0/25 - (125 SE-SSI-12 / 2960S-24P / Int G1/0/28 ---)] Graphs[se-ssi-13 Gi1/0/25 - Status - 125 SE-SSI-12 / 2960S-24P / Int G1/0/28 --- , se-ssi-13 Gi1/0/25 - Status - 125 SE-SSI-12 / 2960S-24P / Int G1/0/28 --- ] Graphs[se-ssi-13 Gi1/0/28 - Status - 128 SE-SSI-14 / 2960S-24P / Int G1/0/25 --- , se-ssi-13 Gi1/0/28 - Status - 128 SE-SSI-14 / 2960S-24P / Int G1/0/25 --- ] Graphs[se-ssi-13 - Traffic - Gi1/0/26 - (126 SE-SSI-16 / 3560-24P / Int G0/1 ---), se-ssi-13 - Traffic - Gi1/0/26 - (126 SE-SSI-16 / 3560-24P / Int G0/1 ---)] Graphs[se-ssi-13 - Traffic - Gi1/0/25 - (125 SE-SSI-12 / 2960S-24P / Int G1/0/28 ---), se-ssi-13 - Traffic - Gi1/0/25 - (125 SE-SSI-12 / 2960S-24P / Int G1/0/28 ---)] DS[se-ssi-13 - Traffic - Gi1/0/28 128 SE-SSI-14 / 2960S-24P / Int G1/0/25 ---, se-ssi-13 Gi1/0/26 - Status - 126 SE-SSI-16 / 3560-24P / Int G0/1 --- , se-ssi-13 - Traffic - Gi1/0/26 126 SE-SSI-16 / 3560-24P / Int G0/1 ---, se-ssi-13 - Traffic - Gi1/0/28 128 SE-SSI-14 / 2960S-24P / Int G1/0/25 ---, se-ssi-13 - Traffic - Gi1/0/25 125 SE-SSI-12 / 2960S-24P / Int G1/0/28 ---, se-ssi-13 Gi1/0/25 - Status - 125 SE-SSI-12 / 2960S-24P / Int G1/0/28 --- , se-ssi-13 Gi1/0/28 - Status - 128 SE-SSI-14 / 2960S-24P / Int G1/0/25 --- , se-ssi-13 - Traffic - Gi1/0/26 126 SE-SSI-16 / 3560-24P / Int G0/1 ---, se-ssi-13 - Traffic - Gi1/0/25 125 SE-SSI-12 / 2960S-24P / Int G1/0/28 ---]
2018/06/25 11:16:01 - POLLER: Poller[Main Poller] WARNING: There are '2' detected as overrunning a polling process, please investigate
2018/06/25 11:16:00 - SYSTEM STATS: Time:59.1470 Method:spine Processes:3 Threads:5 Hosts:997 HostsPerProcess:333 DataSources:8391 RRDsProcessed:4491


And just removing those 5113 device and I have :
2018/06/25 12:48:30 - SYSTEM STATS: Time:28.2434 Method:spine Processes:3 Threads:5 Hosts:997 HostsPerProcess:333 DataSources:8618 RRDsProcessed:4796
2018/06/25 12:47:36 - SYSTEM THOLD STATS: CPUTime:0 MaxRuntime:0 Tholds:0 TotalDevices:997 DownDevices:3 NewDownDevices:0 Processes: 0 completed, 0 running, 0 broken


Any clue how to fix this ?

_________________
CentOS
Production
Cacti 0.8.8h
Spine 0.8.8h
PIA 3.1
Aggregate 0.75
Monitor 1.3
Settings 0.71
Weathermap 0.98
Thold 0.5
rrdclean 0.41

Own plugin: LinkDiscovery 0.3, Map 0.4

Test
Cacti 1.2.1
Spine 1.2.1
thold 1.0.6
monitor 2.3.5
php 7.2.11
mariadb 5.5.56
Own plugin:
ExtendDB 1.1.2
LinkDiscovery 1.2.4
Map 1.2.5


Top
 Profile  
 
 Post subject: Re: Cacti: Spine 1.1.38 unable to handle large disabled devi
PostPosted: Mon Jun 25, 2018 10:32 am 
Offline
Cacti Guru User

Joined: Sun Aug 27, 2017 12:05 am
Posts: 2083
So the bits that stand out to me are:
SYSTEM STATS: Time:28.2434 Method:spine Processes:3 Threads:5 Hosts:997 HostsPerProcess:333 DataSources:8618 RRDsProcessed:4796
vs
SYSTEM STATS: Time:59.0163 Method:spine Processes:3 Threads:5 Hosts:997 HostsPerProcess:333 DataSources:8781 RRDsProcessed:3902

The first thing is, you said you removed 1000+ devices when you only have 997 in total. The second is, you should increase your thread count for spine. I have mine at 30 which works OK for me so you should try increasing that. It is basically the number of devices you have being polled at the same time, within that process. So, if 5 consecutive hosts are timing out, it has to wait for all five to timeout before it moves on to the next one.

_________________
Official Cacti Developer

Cacti Resources:
Cacti Website (including releases)
Cacti Issues
Cacti Development Releases
Cacti Development Documentation

My resources:
How to submit Pull Requests
Development Wiki and How To's
Updated NetSNMP Memory template for Cacti 1.x
Cisco SFP template for Cacti 0.8.8


Top
 Profile  
 
 Post subject: Re: Cacti: Spine 1.1.38 unable to handle large disabled devi
PostPosted: Tue Jun 26, 2018 12:05 am 
Offline
Cacti User

Joined: Wed Dec 07, 2011 9:19 am
Posts: 312
Well I have 5113 devices in Cacti, only 997 are active, that what I said, the other are disabled and used as inventory !

Second point I will try with 30 thread, but last I did that kind of testing give me problem, since all process try to access a single table with the update, so the bottleneck was the Database.
That's why I try to go the other way around, and try to find the smallest thread number, to avoid dtabase lock issue.

And by the way the design of large scale cacti is a known problem (https://github.com/Cacti/cacti/issues/1060)

_________________
CentOS
Production
Cacti 0.8.8h
Spine 0.8.8h
PIA 3.1
Aggregate 0.75
Monitor 1.3
Settings 0.71
Weathermap 0.98
Thold 0.5
rrdclean 0.41

Own plugin: LinkDiscovery 0.3, Map 0.4

Test
Cacti 1.2.1
Spine 1.2.1
thold 1.0.6
monitor 2.3.5
php 7.2.11
mariadb 5.5.56
Own plugin:
ExtendDB 1.1.2
LinkDiscovery 1.2.4
Map 1.2.5


Top
 Profile  
 
 Post subject: Re: Cacti: Spine 1.1.38 unable to handle large disabled devi
PostPosted: Tue Jun 26, 2018 3:18 am 
Offline
Cacti Guru User

Joined: Sun Aug 27, 2017 12:05 am
Posts: 2083
That design problem is more when working with multiple remote pollers. Since the remote pollers actually try to update the main database. The changes are designed to make the local pollers use their own local database and have the main server query the DB on the poller as needed (if I remember correctly).

I still think that if you implement the three things i suggested, you should see improvements. At the very least, try them out and post the SYSTEM STATS messages from before and after so we can see the results.

_________________
Official Cacti Developer

Cacti Resources:
Cacti Website (including releases)
Cacti Issues
Cacti Development Releases
Cacti Development Documentation

My resources:
How to submit Pull Requests
Development Wiki and How To's
Updated NetSNMP Memory template for Cacti 1.x
Cisco SFP template for Cacti 0.8.8


Top
 Profile  
 
 Post subject: Re: Cacti: Spine 1.1.38 unable to handle large disabled devi
PostPosted: Wed Jan 30, 2019 3:07 pm 
Offline
Cacti User

Joined: Mon Oct 01, 2018 10:09 am
Posts: 88
Hello,
I have the same problem - here - searched hi and low, with no real fix. Constantly get poller table not empty warnings.
Also have Juniper MX-80s that ALWAYS give snmp timeouts and gaps in graphs.
I have 60 disabled devices, and 388 enabled devices. Most are routers and switches. Does anyone have a clue on how these warnings can be minimized or eliminated? Using cacti-1.1.38, spine 1.1.38, cacti cron job, mysql 5.3.3, php 5.3, redhat 6.10

I have attached a view of the warnings in the log:
2019/01/30 20:01:40 - SYSTEM THOLD STATS: Time:0.0280 Tholds:2 TotalDevices:388 DownDevices:42 NewDownDevices:0
2019/01/30 20:01:39 - SYSTEM STATS: Time:11.3700 Method:spine Processes:2 Threads:30 Hosts:388 HostsPerProcess:194 DataSources:2510 RRDsProcessed:1285
2019/01/30 20:01:28 - POLLER: Poller[Main Poller] WARNING: Poller Output Table not Empty. Issues: 17, Graphs[stl-73-307-rsw - Traffic 30sec - Te1/13 , stl-73-307-rsw - Traffic 30sec - Te1/13 ] Graphs[stl-73-307-rsw - Traffic 30sec - Te1/14 , stl-73-307-rsw - Traffic 30sec - Te1/14 ] Graphs[sea-2-25-asw8 - Traffic - Gi1/1/4, sea-2-25-asw8 - Traffic - Gi1/1/4] Graphs[sea-2-25-asw8 - Traffic - Gi1/1/4, sea-2-25-asw8 - Traffic - Gi1/1/4] Graphs[|host_description| - Traffic 30sec - |query_ifName| , |host_description| - Traffic 30sec - |query_ifName| ] Graphs[|host_description| - Traffic 30sec - |query_ifName| , |host_description| - Traffic 30sec - |query_ifName| ] Graphs[msa-50-531-rtr3 - Traffic - Gi0/1, msa-50-531-rtr3 - Traffic - Gi0/1] Graphs[msa-50-531-rtr3 - Traffic - Gi0/1, msa-50-531-rtr3 - Traffic - Gi0/1] Graphs[sea-2-31-asw7 - Traffic - Gi1/1/4, sea-2-31-asw7 - Traffic - Gi1/1/4] Graphs[sea-2-31-asw7 - Traffic - Gi1/1/4, sea-2-31-asw7 - Traffic - Gi1/1/4] Graphs[evt-40-40-asw2 - Traffic - Gi1/1, evt-40-40-asw2 - Traffic - Gi1/1] Graphs[evt-40-40-asw2 - Traffic - Gi1/1, evt-40-40-asw2 - Traffic - Gi1/1] Graphs[kor-99-d35-esw1 - Traffic - Te1/0/23, kor-99-d35-esw1 - Traffic - Te1/0/23] Graphs[kor-99-d35-esw1 - Traffic - Te1/0/23, kor-99-d35-esw1 - Traffic - Te1/0/23] Graphs[kor-99-d35-esw1 - Traffic - Te1/0/4, kor-99-d35-esw1 - Traffic - Te1/0/4] Graphs[kor-99-d35-esw1 - Traffic - Te1/0/4, kor-99-d35-esw1 - Traffic - Te1/0/4] Graphs[sbc-52-880-steven - Traffic 30sec - Vl11 , sbc-52-880-steven - Traffic 30sec - Vl11 ] DS[stl-73-307-rsw - Traffic 30sec - Te1/13, stl-73-307-rsw - Traffic 30sec - Te1/14, sea-2-25-asw8 - Traffic - Gi1/1/4, sea-2-25-asw8 - Traffic - Gi1/1/4, sea-9-53-rsw - Traffic 30sec - Te1/18, sea-9-53-rsw - Traffic 30sec - Te1/19, msa-50-531-rtr3 - Traffic - Gi0/1, msa-50-531-rtr3 - Traffic - Gi0/1, sea-2-31-asw7 - Traffic - Gi1/1/4, sea-2-31-asw7 - Traffic - Gi1/1/4, evt-40-40-asw2 - Traffic - |query_ifName|, evt-40-40-asw2 - Traffic - |query_ifName|, kor-99-d35-esw1 - Traffic - Te1/0/23, kor-99-d35-esw1 - Traffic - Te1/0/23, kor-99-d35-esw1 - Traffic - Te1/0/4, kor-99-d35-esw1 - Traffic - Te1/0/4, sbc-52-880-steven - Traffic 30sec - 10.48.11.7 - Vl11 ]
2019/01/30 20:01:11 - SYSTEM THOLD STATS: Time:0.0261 Tholds:2 TotalDevices:388 DownDevices:42 NewDownDevices:0
2019/01/30 20:01:10 - SYSTEM STATS: Time:11.3747 Method:spine Processes:2 Threads:30 Hosts:388 HostsPerProcess:194 DataSources:2510 RRDsProcessed:1298
2019/01/30 20:00:59 - POLLER: Poller[Main Poller] WARNING: Poller Output Table not Empty. Issues: 20, Graphs[stl-73-307-rsw - Traffic 30sec - Te1/16 , stl-73-307-rsw - Traffic 30sec - Te1/16 ] Graphs[stl-73-307-rsw - Traffic 30sec - Te1/15 , stl-73-307-rsw - Traffic 30sec - Te1/15 ] Graphs[sbc-52-880-steven - Traffic 30sec - Vl11 , sbc-52-880-steven - Traffic 30sec - Vl11 ] Graphs[hbc-37-22-asw16 - Traffic - Gi1/1/4, hbc-37-22-asw16 - Traffic - Gi1/1/4] Graphs[hbc-37-22-asw16 - Traffic - Gi1/1/4, hbc-37-22-asw16 - Traffic - Gi1/1/4] Graphs[|host_description| - Traffic 30sec - |query_ifName| , |host_description| - Traffic 30sec - |query_ifName| ] Graphs[evt-40-40-asw2 - Traffic - Gi0/1, evt-40-40-asw2 - Traffic - Gi0/1] Graphs[evt-40-40-asw2 - Traffic - Gi0/1, evt-40-40-asw2 - Traffic - Gi0/1] Graphs[kor-99-d35-esw1 - Traffic - Te1/0/22, kor-99-d35-esw1 - Traffic - Te1/0/22] Graphs[kor-99-d35-esw1 - Traffic - Te1/0/22, kor-99-d35-esw1 - Traffic - Te1/0/22] Graphs[kor-99-d35-esw1 - Traffic - Te1/0/24, kor-99-d35-esw1 - Traffic - Te1/0/24] Graphs[kor-99-d35-esw1 - Traffic - Te1/0/24, kor-99-d35-esw1 - Traffic - Te1/0/24] Graphs[|host_description| - Traffic 30sec - |query_ifName| , |host_description| - Traffic 30sec - |query_ifName| ] Graphs[sea-9-120-asw5 - Traffic - Gi0/24, sea-9-120-asw5 - Traffic - Gi0/24] Graphs[sea-9-120-asw5 - Traffic - Gi0/24, sea-9-120-asw5 - Traffic - Gi0/24] Graphs[msa-50-531-rtr3 - Traffic - Gi0/2, msa-50-531-rtr3 - Traffic - Gi0/2] Graphs[msa-50-531-rtr3 - Traffic - Gi0/2, msa-50-531-rtr3 - Traffic - Gi0/2] Graphs[msa-50-531-rtr3 - Traffic - Gi0/0, msa-50-531-rtr3 - Traffic - Gi0/0] Graphs[msa-50-531-rtr3 - Traffic - Gi0/0, msa-50-531-rtr3 - Traffic - Gi0/0] Graphs[knt-18-61-asw4 - Traffic 30sec - Gi1/1 , knt-18-61-asw4 - Traffic 30sec - Gi1/1 ] DS[stl-73-307-rsw - Traffic 30sec - Te1/16, stl-73-307-rsw - Traffic 30sec - Te1/15, sbc-52-880-steven - Traffic 30sec - 10.48.11.7 - Vl11 , hbc-37-22-asw16 - Traffic - Gi1/1/4, hbc-37-22-asw16 - Traffic - Gi1/1/4, sea-9-53-rsw - Traffic 30sec - Te1/18, evt-40-40-asw2 - Traffic - |query_ifName|, evt-40-40-asw2 - Traffic - |query_ifName|, kor-99-d35-esw1 - Traffic - Te1/0/22, kor-99-d35-esw1 - Traffic - Te1/0/22, kor-99-d35-esw1 - Traffic - Te1/0/24, kor-99-d35-esw1 - Traffic - Te1/0/24, sea-9-53-rsw - Traffic 30sec - Te1/19, sea-9-120-asw5 - Traffic - Gi0/24, sea-9-120-asw5 - Traffic - Gi0/24, msa-50-531-rtr3 - Traffic - |query_ifIP| - Gi0/2, msa-50-531-rtr3 - Traffic - |query_ifIP| - Gi0/2, msa-50-531-rtr3 - Traffic - |query_ifIP| - Gi0/0, msa-50-531-rtr3 - Traffic - |query_ifIP| - Gi0/0, knt-18-61-asw4 - Traffic 30sec - Gi1/1]
2019/01/30 20:00:42 - SYSTEM THOLD STATS: Time:0.0253 Tholds:2 TotalDevices:388 DownDevices:42 NewDownDevices:0

eholz1 - not empty poller table!


Top
 Profile  
 
 Post subject: Re: Cacti: Spine 1.1.38 unable to handle large disabled devi
PostPosted: Sun Feb 03, 2019 7:37 am 
Offline
Cacti Pro User
User avatar

Joined: Mon Jan 05, 2015 10:10 am
Posts: 744
There is a small issue with reindexing in 1.2.x. Looks like there will be remediation in 1.2.2 I think.

_________________
Before history, there was a paradise, now dust.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC - 5 hours


Who is online

Users browsing this forum: No registered users and 12 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  

Protected by Anti-Spam ACP Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group