[Cacti 1.2.1] Either queries or sources but not both?!

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Moderators, Developers

Post Reply
Author
Message
Hotratz
Posts: 23
Joined: Thu Jun 01, 2017 12:44 pm

[Cacti 1.2.1] Either queries or sources but not both?!

#1 Post by Hotratz » Thu Jan 31, 2019 6:29 pm

I am currently monitoring just over 60 remote linux hosts. Many of them are both queried using custom scripts over ssh (think Percona ssh) as well as standard SNMP (bandwidth) queries and data input methods.

Many sites are monitored via these data queries: UPS, (ssh/script), bandwidth (SNMP)
Also using these data input methods: #users (SNMP), #processes (SNMP), and board temps (ssh/script)

The UPS data query gathers three values: battery %charge, line volts, and time left on battery

There is one site that I am having difficulty with. I can either monitor the UPS and board temps or I can monitor bandwidth - but not both. If I am initially monitoring UPS and temps first, then I add bandwidth, monitoring of UPS and temps stop.

The log reveals that the poller is running the UPS data query but only executing retrieval of two values, not all three - so it doesn't plot. The poller does not execute the board temp script at all. However, bandwidth is run just fine and produces clean plots.

If I delete the bandwidth queries, UPS and temps resume.

There is nothing unusual about the setup for this particular host. There are several other hosts that have the same behavior. The network health is good, low latency, no bandwidth limitations. However, there are many other hosts that have all these data sources monitored without issue.

I have tried creating an entirely new instance of the device and adding new data sources and experience the same problem. I've made sure that the poller intervals/cron/spine are all in sync and have rebuilt the poller cache but to no avail. I can run the scripts manually without issue.

Cron is set to 5 min and spine poller is at 1min intervals as well as the data queries and input methods.

I appreciate any suggestions about how to sort out this problem.

Here is a snippet of the log showing UPS and temps executed successfully - before I added the network SNMP bandwidth query - the followed by another snippet showing log after I add the SNMP query:

>>>> UPS and Temps only <<<
2019/01/31 12:59:03 - SPINE: Poller[1] Device[75] HT[2] DS[512] SCRIPT: /bin/perl /var/www/html/cacti-1.2.1/scripts/query_ups.pl '192.168.100.2' 'get' 'bcharge' 'Back-UPS_RS_1500G', output: 100.0
2019/01/31 12:59:03 - SPINE: Poller[1] Device[75] HT[2] Total Time: 46 Seconds
2019/01/31 12:59:03 - SPINE: Poller[1] Device[75] HT[3] DS[512] SCRIPT: /bin/perl /var/www/html/cacti-1.2.1/scripts/query_ups.pl '192.168.100.2' 'get' 'timeleft' 'Back-UPS_RS_1500G', output: 317.5
2019/01/31 12:59:03 - SPINE: Poller[1] Device[75] HT[3] Total Time: 46 Seconds
2019/01/31 12:59:03 - SPINE: Poller[1] Device[75] HT[1] DS[512] SCRIPT: /bin/perl /var/www/html/cacti-1.2.1/scripts/query_ups.pl '192.168.100.2' 'get' 'linev' 'Back-UPS_RS_1500G', output: 230.0
2019/01/31 12:59:03 - SPINE: Poller[1] Device[75] HT[1] Total Time: 47 Seconds
2019/01/31 12:59:04 - SPINE: Poller[1] Device[75] HT[4] DS[513] SCRIPT: /bin/perl /var/www/html/cacti/scripts/delta_temp.pl '192.168.100.2', output: boardtemp:68.3
2019/01/31 12:59:04 - SPINE: Poller[1] Device[75] HT[4] Total Time: 47 Seconds
2019/01/31 12:59:22 - SPINE: Poller[1] Device[75] SNMP Result: Device responded to SNMP
2019/01/31 12:59:23 - SPINE: Poller[1] Device[75] HT[4] DQ[1] RECACHE OID: .1.3.6.1.2.1.1.3.0, (assert: 403975169 < output: 403981505)
2019/01/31 12:59:23 - SPINE: Poller[1] Device[75] HT[4] DQ[7] RECACHE OID: .1.3.6.1.2.1.1.3.0, (assert: 403975169 < output: 403981505)
2019/01/31 12:59:23 - SPINE: Poller[1] Device[75] HT[4] NOTE: There are '1' Polling Items for this Device
2019/01/31 12:59:23 - SPINE: Poller[1] Device[75] HT[3] DQ[1] RECACHE OID: .1.3.6.1.2.1.1.3.0, (assert: 403975169 < output: 403981532)
2019/01/31 12:59:23 - SPINE: Poller[1] Device[75] HT[3] DQ[7] RECACHE OID: .1.3.6.1.2.1.1.3.0, (assert: 403975169 < output: 403981532)
2019/01/31 12:59:23 - SPINE: Poller[1] Device[75] HT[3] NOTE: There are '1' Polling Items for this Device
2019/01/31 12:59:23 - SPINE: Poller[1] Device[75] HT[2] DQ[1] RECACHE OID: .1.3.6.1.2.1.1.3.0, (assert: 403975169 < output: 403981532)
2019/01/31 12:59:23 - SPINE: Poller[1] Device[75] HT[2] DQ[7] RECACHE OID: .1.3.6.1.2.1.1.3.0, (assert: 403975169 < output: 403981532)
2019/01/31 12:59:23 - SPINE: Poller[1] Device[75] HT[2] NOTE: There are '1' Polling Items for this Device
2019/01/31 12:59:24 - SPINE: Poller[1] Device[75] HT[1] DQ[1] RECACHE OID: .1.3.6.1.2.1.1.3.0, (assert: 403975169 < output: 403981606)
2019/01/31 12:59:24 - SPINE: Poller[1] Device[75] HT[1] DQ[7] RECACHE OID: .1.3.6.1.2.1.1.3.0, (assert: 403975169 < output: 403981606)
2019/01/31 12:59:24 - SPINE: Poller[1] Device[75] HT[1] NOTE: There are '1' Polling Items for this Device

>>> UPS, Temps, AND Bandwidth (note, poller executes on two of three UPS queries and ignores Temps data input) <<<

2019/01/31 13:01:13 - SPINE: Poller[1] Device[75] HT[3] DS[514] SNMP: v3: 161.200.93.136, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.2, value: 1842333913
2019/01/31 13:01:13 - SPINE: Poller[1] Device[75] HT[3] DS[514] SNMP: v3: 161.200.93.136, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.2, value: 4265318135
2019/01/31 13:01:13 - SPINE: Poller[1] Device[75] HT[3] Total Time: 2.7 Seconds
2019/01/31 13:01:13 - SPINE: Poller[1] Device[75] HT[4] DS[515] SNMP: v3: 161.200.93.136, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.3, value: 4200248986
2019/01/31 13:01:13 - SPINE: Poller[1] Device[75] HT[4] DS[515] SNMP: v3: 161.200.93.136, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.3, value: 3595183233
2019/01/31 13:01:13 - SPINE: Poller[1] Device[75] HT[4] Total Time: 2.7 Seconds
2019/01/31 13:01:14 - SPINE: Poller[1] Device[75] HT[1] DQ[1] RECACHE OID: .1.3.6.1.2.1.1.3.0, (assert: 403986138 < output: 403992582)
2019/01/31 13:01:14 - SPINE: Poller[1] Device[75] HT[1] DQ[7] RECACHE OID: .1.3.6.1.2.1.1.3.0, (assert: 403986138 < output: 403992582)
2019/01/31 13:01:14 - SPINE: Poller[1] Device[75] HT[1] NOTE: There are '2' Polling Items for this Device
2019/01/31 13:01:56 - SPINE: Poller[1] Device[75] HT[2] DS[512] SCRIPT: /bin/perl /var/www/html/cacti-1.2.1/scripts/query_ups.pl '161.200.93.136' 'get' 'timeleft' 'Back-UPS_RS_1500G', output: 425.0
2019/01/31 13:01:57 - SPINE: Poller[1] Device[75] HT[1] DS[512] SCRIPT: /bin/perl /var/www/html/cacti-1.2.1/scripts/query_ups.pl '161.200.93.136' 'get' 'linev' 'Back-UPS_RS_1500G', output: 230.0
2019/01/31 13:02:15 - SPINE: Poller[1] Device[75] SNMP Result: Device responded to SNMP
2019/01/31 13:02:15 - SPINE: Poller[1] Device[75] HT[2] DQ[1] RECACHE OID: .1.3.6.1.2.1.1.3.0, (assert: 403992582 < output: 403998672)
2019/01/31 13:02:15 - SPINE: Poller[1] Device[75] HT[2] DQ[7] RECACHE OID: .1.3.6.1.2.1.1.3.0, (assert: 403992582 < output: 403998672)
2019/01/31 13:02:15 - SPINE: Poller[1] Device[75] HT[2] NOTE: There are '2' Polling Items for this Device
2019/01/31 13:02:15 - SPINE: Poller[1] Device[75] HT[4] DQ[1] RECACHE OID: .1.3.6.1.2.1.1.3.0, (assert: 403992582 < output: 403998672)
2019/01/31 13:02:15 - SPINE: Poller[1] Device[75] HT[4] DQ[7] RECACHE OID: .1.3.6.1.2.1.1.3.0, (assert: 403992582 < output: 403998672)
2019/01/31 13:02:15 - SPINE: Poller[1] Device[75] HT[4] NOTE: There are '2' Polling Items for this Device
2019/01/31 13:02:15 - SPINE: Poller[1] Device[75] HT[6] DQ[1] RECACHE OID: .1.3.6.1.2.1.1.3.0, (assert: 403992582 < output: 403998679)
2019/01/31 13:02:15 - SPINE: Poller[1] Device[75] HT[6] DQ[7] RECACHE OID: .1.3.6.1.2.1.1.3.0, (assert: 403992582 < output: 403998679)
2019/01/31 13:02:15 - SPINE: Poller[1] Device[75] HT[6] Total Time: 1.7 Seconds
2019/01/31 13:02:15 - SPINE: Poller[1] Device[75] HT[5] DQ[1] RECACHE OID: .1.3.6.1.2.1.1.3.0, (assert: 403992582 < output: 403998687)
2019/01/31 13:02:15 - SPINE: Poller[1] Device[75] HT[5] DQ[7] RECACHE OID: .1.3.6.1.2.1.1.3.0, (assert: 403992582 < output: 403998687)
2019/01/31 13:02:15 - SPINE: Poller[1] Device[75] HT[5] Total Time: 1.8 Seconds
2019/01/31 13:02:15 - SPINE: Poller[1] Device[75] HT[3] DQ[1] RECACHE OID: .1.3.6.1.2.1.1.3.0, (assert: 403992582 < output: 403998691)
2019/01/31 13:02:15 - SPINE: Poller[1] Device[75] HT[3] DQ[7] RECACHE OID: .1.3.6.1.2.1.1.3.0, (assert: 403992582 < output: 403998691)
2019/01/31 13:02:15 - SPINE: Poller[1] Device[75] HT[3] NOTE: There are '2' Polling Items for this Device

User avatar
Osiris
Cacti Pro User
Posts: 835
Joined: Mon Jan 05, 2015 10:10 am

Re: [Cacti 1.2.1] Either queries or sources but not both?!

#2 Post by Osiris » Sun Feb 03, 2019 8:05 am

40+ seconds, you have to figure that out, never going to scale. Maybe get the data async.
Before history, there was a paradise, now dust.

Post Reply