|
|
| Author |
Message |
frankfegert
Joined: 16 Nov 2007 Posts: 26 Location: Stuttgart, Germany
|
Posted: Sat Feb 02, 2008 6:57 pm Post subject: spine 0.8.7-SVN segfaults |
|
|
Hello,
i don't know if this is related to the other spine segfault reports, so i'm starting a new topic.
Running:
- On Solaris 10 w/ net-snmp-5.3.1
- spine-0.8.7 from SVN with the following patches:
| Code: |
--- ping.c.orig Sat Jan 19 13:48:42 2008
+++ ping.c Sat Jan 19 14:49:37 2008
@@ -226,7 +226,7 @@
struct sockaddr_in fromname;
char socket_reply[BUFSIZE];
int retry_count;
- char *cacti_msg = "cacti-monitoring-system";
+ char *cacti_msg = "cacti-monitoring-system\0";
int packet_len;
int fromlen;
int return_code;
@@ -750,8 +788,11 @@
sum += *w++;
nleft -= 2;
}
- if (nleft == 1)
- sum += *(unsigned char*)w;
+ if (nleft == 1) {
+ *(unsigned char *)(&answer) = *(unsigned char *)w ;
+ sum += answer;
+ }
+
sum = (sum >> 16) + (sum & 0xffff);
sum += (sum >> 16);
answer = ~sum; /* truncate to 16 bits */
--- poller.c.orig Sun Jan 6 18:40:14 2008
+++ poller.c Sun Jan 6 18:40:33 2008
@@ -126,7 +126,7 @@
char last_snmp_password[50];
char last_snmp_auth_protocol[5];
char last_snmp_priv_passphrase[200];
- char last_snmp_priv_protocol[6];
+ char last_snmp_priv_protocol[7];
char last_snmp_context[65];
/* reindex shortcuts to speed polling */
|
spine bombs out with the following truss output, but only on if the ICMP&SNMP downed detection method is chosen and a script query is run. It only happens every other run and not with SNMP-queries. If i turn the downed detection method to SNMP-only everything seems fine.
| Code: |
truss -f -wall -u ::snmp_shutdown /usr/local/bin/spine -R -S -f 6 -l 6
...
17479/1: waitid(P_PID, 17481, 0xFFBFF490, WEXITED|WTRAPPED) = 0
17479/1: brk(0x0003ADC8) = 0
17479/1: _exit(0)
17478/3: waitid(P_PID, 17479, 0xFE977E88, WEXITED|WTRAPPED) = 0
Host[6] DS[195] SCRIPT: <script> <host>:<port> <parameters> valid, output: status:1 qm:10
17478/3: write(1, 0xFECF7238, 162) = 162
17478/3: H o s t [ 6 ] D S [ 1 9 5 ] S C R I P T : <script>
17478/3: <host>:<port> <parameters>
17478/3: v a l i d , o u t p u t : s t a t u s : 1 q m : 1 0
17478/3: \n\n
17478/3: brk(0x000F63E0) = 0
17478/3: brk(0x000FE3E0) = 0
17478/3: close(6) = 0
17478/3: fcntl(7, F_SETFL, FWRITE|FNONBLOCK) = 0
17478/3: read(7, 0x000EEC70, 8192) Err#11 EAGAIN
17478/3: fcntl(7, F_SETFL, FWRITE) = 0
17478/3: write(7, "01\0\0\001", 5) = 5
17478/3: shutdown(7, SHUT_RDWR, SOV_DEFAULT) = 0
17478/3: close(7) = 0
17478/3: lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0x0000FFF7) = 0xFFBFFEFF [0x0000FFFF]
17478/3: lwp_exit()
17478/1: nanosleep(0xFFBFF240, 0x00000000) = 0
17478/1: nanosleep(0xFFBFF240, 0x00000000) = 0
17478/1@1: -> libnetsnmp:snmp_shutdown(0x1f0e8, 0x0, 0xff000000, 0xff000000)
17478/1: Incurred fault #6, FLTBOUNDS %pc = 0xFEC56178
17478/1: siginfo: SIGSEGV SEGV_MAPERR addr=0x001775F8
17478/1: Received signal #11, SIGSEGV [caught]
17478/1: siginfo: SIGSEGV SEGV_MAPERR addr=0x001775F8
17478/1: lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF]
17478/1: sigaction(SIGSEGV, 0xFFBFE3E0, 0xFFBFE480) = 0
FATAL: Spine Encountered a Segmentation Fault (Spine parent)
17478/1: write(1, 0xFECF7238, 62) = 62
17478/1: F A T A L : S p i n e E n c o u n t e r e d a S e g m e
17478/1: n t a t i o n F a u l t ( S p i n e p a r e n t )\n\n
17478/1: llseek(3, 0, SEEK_CUR) = 2463
17478/1: _exit(11)
|
The spine processes started from cron keep hanging around and start piling up. Attaching to them with truss only shows them sleeping.
Has anyone else experienced this? The bugs related to net-snmps snmp_shutdown seemed to be all fixed in my version (5.3.1). I guess i'll try updating net-snmp tomorrow and will be reporting back.
Regards,
Frank |
|
| Back to top |
|
 |
frankfegert
Joined: 16 Nov 2007 Posts: 26 Location: Stuttgart, Germany
|
Posted: Sun Feb 03, 2008 5:47 pm Post subject: |
|
|
Hello,
updated to net-snmp-5.4.1 today. The problem remains the same - the spine thread bombs out with a segfault from snmp_spine_close()/snmp_shutdown().
Anyone got an any idea what - besides a bug in net-snmp - could be the cause?
Regards,
Frank |
|
| Back to top |
|
 |
fmangeant Cacti Guru User
Joined: 19 Sep 2003 Posts: 2325 Location: Sophia-Antipolis, France
|
Posted: Mon Feb 04, 2008 2:48 am Post subject: |
|
|
| Moving to "Unstable Development Versions". |
|
| Back to top |
|
 |
frankfegert
Joined: 16 Nov 2007 Posts: 26 Location: Stuttgart, Germany
|
Posted: Wed Feb 06, 2008 2:35 pm Post subject: |
|
|
Rebuild net-snmp-5.4.1 with debugging symbols and found this in the core dump:
# gdb /usr/local/bin/spine ../../core.spine.12453
GNU gdb 6.7.1
Copyright (C) 2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.10"...
Reading symbols from /usr/local/lib/libnetsnmp.so.15...done.
Loaded symbols for /usr/local/lib/libnetsnmp.so.15
Reading symbols from /usr/local/lib/mysql/libmysqlclient_r.so.14...done.
Loaded symbols for /usr/local/lib/mysql/libmysqlclient_r.so.14
Reading symbols from /lib/librt.so.1...done.
Loaded symbols for /lib/librt.so.1
Reading symbols from /lib/libgen.so.1...done.
Loaded symbols for /lib/libgen.so.1
Reading symbols from /lib/libthread.so.1...
warning: Lowest section in /lib/libthread.so.1 is .dynamic at 00000074 done.
Loaded symbols for /lib/libthread.so.1
Reading symbols from /usr/local/lib/libssl.so.0.9.8...done.
Loaded symbols for /usr/local/lib/libssl.so.0.9.8
Reading symbols from /usr/local/lib/libcrypto.so.0.9.8...done.
Loaded symbols for /usr/local/lib/libcrypto.so.0.9.8
Reading symbols from /lib/libkstat.so.1...done.
Loaded symbols for /lib/libkstat.so.1
Reading symbols from /usr/lib/libz.so.1...done.
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/libpthread.so.1...
warning: Lowest section in /lib/libpthread.so.1 is .dynamic at 00000074 done.
Loaded symbols for /lib/libpthread.so.1
Reading symbols from /lib/libm.so.2...done.
Loaded symbols for /lib/libm.so.2
Reading symbols from /lib/libsocket.so.1...done.
Loaded symbols for /lib/libsocket.so.1
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libc.so.1...done.
Loaded symbols for /lib/libc.so.1
Reading symbols from /usr/local/lib/libgcc_s.so.1...done.
Loaded symbols for /usr/local/lib/libgcc_s.so.1
Reading symbols from /lib/libaio.so.1...done.
Loaded symbols for /lib/libaio.so.1
Reading symbols from /lib/libmd.so.1...done.
Loaded symbols for /lib/libmd.so.1
Reading symbols from /lib/libdl.so.1...
warning: Lowest section in /lib/libdl.so.1 is .hash at 000000b4 done.
Loaded symbols for /lib/libdl.so.1
Reading symbols from /platform/SUNW,Sun-Fire-V240/lib/libc_psr.so.1...done.
Loaded symbols for /platform/SUNW,Sun-Fire-V240/lib/libc_psr.so.1
Reading symbols from /usr/lib/locale/en_US.ISO8859-1/en_US.ISO8859-1.so.3...done.
Loaded symbols for /usr/lib/locale/en_US.ISO8859-1/en_US.ISO8859-1.so.3
Reading symbols from /lib/ld.so.1...done.
Loaded symbols for /lib/ld.so.1
Core was generated by `/usr/local/bin/spine -R -S -f 6 -l 6 -V 5'.
Program terminated with signal 11, Segmentation fault.
#0 0xfec56178 in realfree () from /lib/libc.so.1
(gdb) bt
#0 0xfec56178 in realfree () from /lib/libc.so.1
#1 0xfec569a0 in _free_unlocked () from /lib/libc.so.1
#2 0xfec568dc in free () from /lib/libc.so.1
#3 0xff2a60dc in free_enums (spp=0xe19a0) at parse.c:5065
#4 0xff2a6258 in free_partial_tree (tp=0xe2d50, keep_label=0) at parse.c:853
#5 0xff2a63a4 in free_tree (Tree=0xe27c0) at parse.c:878
#6 0xff2a8880 in unload_module_by_ID (modID=48, tree_top=0xe2d50) at parse.c:3999
#7 0xff2a88d0 in unload_module_by_ID (modID=48, tree_top=0xe2c70) at parse.c:3989
#8 0xff2a88d0 in unload_module_by_ID (modID=48, tree_top=0xe2c00) at parse.c:3989
#9 0xff2a88d0 in unload_module_by_ID (modID=48, tree_top=0xe2700) at parse.c:3989
#10 0xff2a88d0 in unload_module_by_ID (modID=48, tree_top=0xe2880) at parse.c:3989
#11 0xff2a88d0 in unload_module_by_ID (modID=48, tree_top=0x94420) at parse.c:3989
#12 0xff2a88d0 in unload_module_by_ID (modID=48, tree_top=0x943b0) at parse.c:3989
#13 0xff2a88d0 in unload_module_by_ID (modID=48, tree_top=0x94340) at parse.c:3989
#14 0xff2a88d0 in unload_module_by_ID (modID=48, tree_top=0x942d0) at parse.c:3989
#15 0xff2a88d0 in unload_module_by_ID (modID=48, tree_top=0x941c8) at parse.c:3989
#16 0xff2a88d0 in unload_module_by_ID (modID=48, tree_top=0x452d0) at parse.c:3989
#17 0xff2a8ba8 in unload_all_mibs () at parse.c:4071
#18 0xff29adb8 in shutdown_mib () at mib.c:2716
#19 0xff2bbbfc in snmp_shutdown (type=0x1eb08 "spine") at snmp_api.c:872
#20 0x0001586c in snmp_spine_close () at snmp.c:127
#21 0x000139e8 in main (argc=5000, argv=0xffbff320) at spine.c:608
(gdb) frame 3
#3 0xff2a60dc in free_enums (spp=0xe19a0) at parse.c:5065
warning: Source file is more recent than executable.
5065 free(pp->label);
(gdb) list
5060 *spp = NULL;
5061
5062 while (pp) {
5063 npp = pp->next;
5064 if (pp->label)
5065 free(pp->label);
5066 free(pp);
5067 pp = npp;
5068 }
5069 }
(gdb) print pp->next
$1 = (struct enum_list *) 0x0
(gdb) print pp->label
$2 = 0xe19b8 "readOnly"
Posted to the net-snmp-users ML. Maybe someone there has a clue why this is breaking.
Regards,
Frank |
|
| Back to top |
|
 |
TheWitness Developer
Joined: 14 May 2002 Posts: 9671 Location: MI, USA
|
Posted: Wed Feb 06, 2008 4:57 pm Post subject: |
|
|
I have some code to commit. This issue is platform specific. My appologies. Just simply to busy.
TheWitness |
|
| Back to top |
|
 |
TheWitness Developer
Joined: 14 May 2002 Posts: 9671 Location: MI, USA
|
Posted: Wed Feb 06, 2008 5:16 pm Post subject: |
|
|
Just for laughs, comment out the two functions:
init_snmp() and snmp_shutdown() and see what happens.
TheWtiness |
|
| Back to top |
|
 |
frankfegert
Joined: 16 Nov 2007 Posts: 26 Location: Stuttgart, Germany
|
Posted: Fri Feb 08, 2008 6:50 am Post subject: |
|
|
Commented 3 occurrences of init_snmp() and one snmp_shutdown(). The spine binary seems to work now, without dumping a core.
Just in case i did misinterpret the changes: This effectively disables SNMP-support within spine, doesn't it? |
|
| Back to top |
|
 |
TheWitness Developer
Joined: 14 May 2002 Posts: 9671 Location: MI, USA
|
Posted: Fri Feb 08, 2008 7:01 pm Post subject: |
|
|
No. It simply disables some of the internals that slow it down a bit. This could have been a permissions problem. However, test using snmp devices and provide feedback.
TheWitness |
|
| Back to top |
|
 |
|
Powered by phpBB © 2001, 2005 phpBB Group
|
|