control for "downed host detection"

Anything that you think should be in Cacti.

Moderators: Moderators, Developers

Post Reply
Author
Message
pvd
Posts: 30
Joined: Sat Aug 09, 2003 3:02 pm

control for "downed host detection"

#1 Post by pvd » Tue Oct 05, 2004 11:18 pm

I have several hosts and the downed host feature is neat for some of them, but it needs to be possible to turn it off ( without finding an obsure post in this forum. about clearing snmp public strings) so three requests.

1) put this info into the FAQ

2) put a control for each host/device to set the method per host rather than globally

3) have an extra setting available
OFF
snmp
snmp+ping
ping


Let me explain that a bit more.

I have several hosts that I can ping (ICMP only) but cannot use snmp. I have a router/FW that only responds to snmp and not ping for security reasons. ( yes its the same rule on both ports ) and I have some boxes where I cannot use snmp or ping but retrieve data with wget + perl scripts.

currently I have the option set to SNMP only, and turn it off on all but one host, as thats the only way to get it to work. That seems like the wrong thing to do.
Phil

pvd
Posts: 30
Joined: Sat Aug 09, 2003 3:02 pm

the FAQ text ( starting point )

#2 Post by pvd » Tue Oct 05, 2004 11:29 pm

When I clickon devices some of my devices are always DOWN, even though I can ping them. How do I fix this.

In the settings menu, under the poller tab, there is a setting to choose a method that can be used to "ping" the hosts or devices. This prevents multiple requests being sent to a host which is down. It uses either an SNMP get, a network ping or both. If it is set to snmp and you do not have snmp running on the target device or you do not have permissions to read the [????] oid, then the device will appear to be down. Equally if you have snmp running on the target device, but have ping traffic blocked by a firewall or turned off, then using ping will not work.

To work around this, Use SNMP for the pings. and disable the snmp gets where they will not work. eg. Set the poller to use SNMP only for "downed host detection" in the poller tab of the settings window. Then for each device that does not support SNMP, go into the devices window, clickon the device and clear the text in the SNMP comunity box.

Now wait 5 mins and you should see your devices recovering and then up.

regards Phil
Phil

User avatar
TheWitness
Developer
Posts: 14855
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

#3 Post by TheWitness » Fri Oct 08, 2004 8:42 pm

I am planning on incorporating a per-host availability setting in 0.8.x. This occurred to us when we found people who were not able to access the SysDescr OID's on some of their devices.

Otherwise, if you don't use SNMP for a host, just simply clear the read community and it will not poll SNMP even though your availability polling uses SNMP.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of MacTrack, Boost, CLog, SpikeKill, Platform RTM, DSStats, maintainer of Spine, lot's of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Central Plugin Repository
Central Templates Repository


I'm still out there people. Getting excited for Cacti 1.2. I think it will be a great release.

Basilio Cat
Posts: 25
Joined: Sun Sep 12, 2004 1:13 pm

#4 Post by Basilio Cat » Mon Oct 11, 2004 1:51 pm

I've been in trouble with downed host detection. When i was using php-based poller, default UDP-ping+SNMP was ok, but when i've switched to cactid SOME of my devices went down. Most of them was ciscos 35xx, with no more similarities between them... However, when i've switched host detection to SNMP only, they went back online.

User avatar
TheWitness
Developer
Posts: 14855
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

#5 Post by TheWitness » Tue Oct 12, 2004 6:05 am

Basilio Cat,

UDP ping relies on the device echoing a UDP port unrecognized message back to the server. We have found a few issues with it. Namely:

a) some devices ignore the error and do not respond (firewalls for example)
b) some operating systems respond with different errors (yes, UDP ping results in an error) and therefore, your system may be responding with yet another error number.

If I could, it would be helpful if you would run 1 pass using cmd.php in DEBUG and either post the log output or e-mail it to me so that I can make sure that the UDP ERROR NO is appropriately coded into the cmd.php/ping.php code.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of MacTrack, Boost, CLog, SpikeKill, Platform RTM, DSStats, maintainer of Spine, lot's of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Central Plugin Repository
Central Templates Repository


I'm still out there people. Getting excited for Cacti 1.2. I think it will be a great release.

Basilio Cat
Posts: 25
Joined: Sun Sep 12, 2004 1:13 pm

#6 Post by Basilio Cat » Tue Oct 19, 2004 7:18 am

Uhm, well, problem is I cannot switch back to CMD mode - this will definitely result in gaps on graphs (i'm graphing too many devices, poller in PHP mode takes too long and consumes too much CPU)
I'll try to cut a piece of PHP ping code and cactid one, and compare them in some way.

Basilio Cat
Posts: 25
Joined: Sun Sep 12, 2004 1:13 pm

#7 Post by Basilio Cat » Tue Oct 19, 2004 7:40 am

Well, there's should be more debugging, I suppose, but that's what i've studied from the source differences:

1. Socket timeouts
PHP poller sets seconds (erm? it's stored in milliseconds i suppose):

Code: Select all

                  socket_set_option($this->socket,
                           SOL_SOCKET,  // socket level
                           SO_RCVTIMEO, // timeout option
                           array(
                                   "sec"=>$this->timeout, // Timeout in seconds
                                   "usec"=>0  // I assume timeout in microsecond
                           ));
C poller sets it in miliseconds*1000, seems like correct way, but for timeouts less that 1 second?

Code: Select all

        /* establish timeout value */
        timeout.tv_sec  = 0;
        timeout.tv_usec = set.ping_timeout * 1000;
...
        setsockopt(udp_socket, SOL_SOCKET, SO_RCVTIMEO, (char*)&timeout, sizeof(timeout));
2. Main difference takes place is in recv timeouts:
No real timeout in PHP

Code: Select all

      $this->start_time();

      socket_write($this->socket, $this->request, $this->request_len);
      $code = @socket_recv($this->socket, $this->reply, 256, 0);

      /* get the end time */
      $this->time = $this->get_time($this->precision);
C poller set inavoidable timeout via select (timeout value is the same as for setsockopt called before)

Code: Select all

                        send(udp_socket, request, request_len, 0);

                        select(numfds, &socket_fds, NULL, NULL, &timeout);

                        if (FD_ISSET(udp_socket, &socket_fds)) {
                                return_code = read(udp_socket, socket_reply, 256
                        } else {
                                return_code = -10;
                        }

User avatar
geraldocastro
Posts: 1
Joined: Wed Aug 24, 2005 2:57 pm

#8 Post by geraldocastro » Thu Sep 29, 2005 11:18 am

Availability has been a problem for us.
We edited poller.c and commented lines reference availability.

146 /* perform a check to see if the host is alive by polling it's SysDesc
147 * if the host down from an snmp perspective, don't poll it.
148 * function sets the ignore_host bit */
149 /*
150 if ((set.availability_method == AVAIL_SNMP) && (host->snmp_community == "")) {
151 update_host_status(HOST_UP, host, ping, set.availability_method);
152
153 if (set.verbose >= POLLER_VERBOSITY_MEDIUM) {
154 snprintf(logmessage, LOGSIZE, "Host[%i] No host availability check possible for '%s'\n", host- >id, host->hostname);
155 cacti_log(logmessage);
156 }
157 }else{
158 if (ping_host(host, ping) == HOST_UP) {
159 update_host_status(HOST_UP, host, ping, set.availability_method);
160 }else{
161 host->ignore_host = 1;
162 update_host_status(HOST_DOWN, host, ping, set.availability_method);
163 }
164 }
165 */
166 // LINE ABOVE INSERTED
167 update_host_status(HOST_UP, host, ping, set.availability_method);


Compiled the cactid e Ok !
Host always UP.

zkenton
Posts: 47
Joined: Fri Jun 29, 2007 2:22 pm

#9 Post by zkenton » Thu Jul 12, 2007 8:51 am

what if the hosts are neither down nor up, but unknown?

Post Reply