Cacti (home)ForumsRepositoryDocumentation
Cacti: offical forums and support  

 FAQFAQ   SearchSearch   MemberlistMemberlist    RegisterRegister   ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in    


[HOWTO] Cacti's setup for really BIG environments

 
Post new topic   Reply to topic    Cacti Forum Index -> Informational/HOWTO's
Author Message
BorisL



Joined: 31 Mar 2007
Posts: 30

PostPosted: Mon Nov 03, 2008 11:47 am    Post subject: [HOWTO] Cacti's setup for really BIG environments Reply with quote

I will try to build a dedicated server for cacti in rather big environment.

Target:
  • 600+ hosts, 70 000+ data sources, 300 000+ data items
  • one week per-5-minutes statistics in RRA
  • 5 minutes poller interval

Hardware:
My hardware configuration is:
  • 2 x Quad-core Intel Xeon 5440
  • 16Gb RAM
  • 4x147G SAS 15k (RAID10 or RAID6)
After full tuning process CPU speed is to be a bottleneck. Since we are going to make rrd updates asynchronous with polling less expensive storage can be used. I use zfs raiz2 built on top of four disks.

Software:
MySQL
0) You will have to use dedicated MySQL instance for single database. I use MySQL daemon running on the same host that cacti runs.
1) Migrate to InnoDB to be able to use row locks.
2) Create indexes: Default cacti's scheme is lacking indexes. Aplly those mentioned in this tweak
3) Place full DB into RAM. That is, on memory disk. Since DB is used as configuration storage that is roughly constant and volatile storage of polled values it can be done. It will give a considerable boost both for webinterface and polling. 2...3Gb memory disk will be convenient for 300k data source items.
For FreeBSD recipe is to add following line into fstab
Code:
md   /base   mfs   rw,-s3g,-m0,noatime   0   0

You'll have to setup two simple scripts:
  • backup
    Code:
    #!/bin/sh
    export LC_CTYPE=ru_RU.UTF-8
    cdate="$(date -j '+%H-%M')";
    folder_date="$(date -j '+%Y-%m-%d')";

    backupdir="/var/backup/cacti/$folder_date"
    [ -d $backupdir ] || mkdir -p $backupdir;
    latest_dump_filename="$backupdir/../db-sql-latest.tbz"
    backup_filename="$backupdir/db-$cdate.sql.bz2"
    /usr/local/bin/mysqldump --all-databases | /usr/local/bin/7z a $backup_filename -si -tbzip2 >/dev/null
    /usr/bin/tar cjf $backupdir/www-$cdate.tbz `realpath /www/cacti` >/dev/null 2>&1
    cp /etc/my.cnf $backupdir/my-$cdate.cnf
    [ -h $latest_dump_filename ] && rm $latest_dump_filename;
    ln -s $backup_filename $latest_dump_filename

  • restore
    Code:
    #!/bin/sh

    PATH="/usr/local/bin:/sbin:/usr/sbin:/usr/local/sbin:/usr/bin:/bin"

    backupdir=/var/backup/cacti
    latest_dump_filename="$backupdir/db-sql-latest.tbz"

    count=0;
    echo -n " Waiting for MySQL start "
    while [ ! -S "/tmp/mysql.sock" ]; do
            sleep 1;
            echo -n "."
            count=$((count+1));
            [ $count -gt 5 ] && { echo " timeout waiting for MySQL server, exit"; exit; }
    done
    echo " done"
    echo " Restoring last MySQL dump"
    bzcat $latest_dump_filename | mysql && mysqladmin flush-privileges


Make your rc.d(init.d) script to
  • restore latest SQL dump just after bringing up MySQL server during boot
  • backup DB just before stopping MySQL daemon

For FreeBSD this can be done using extra rc.d script
Code:
>cat /usr/local/etc/rc.d/cacti_mysql
#!/bin/sh
#

# PROVIDE: cacti_mysql
# REQUIRE: mysql

. /etc/rc.subr

name="cacti_mysql"
rcvar=`set_rcvar`

load_rc_config $name

: ${cacti_mysql_enable="NO"}

command="/path/to/cacti-mysql-unpack.sh"
command_args=""

run_rc_command "$1"

and some extra settings in rc.conf:
Code:
cacti_mysql_enable="YES"
[ "X$_name" = "Xmysql" ] && {
        stop_precmd="sh /path/to/backup-cacti.sh"
}

#wait till cacti's mysql is dumped from memory to disk
rcshutdown_timeout="120"

4) Tune MySQLd to something like this:
Code:
[mysqld]
skip-locking
key_buffer = 512M
query_cache_size = 128M
max_allowed_packet = 16M
table_cache = 512
sort_buffer_size = 128M
net_buffer_length = 8K
read_buffer_size = 1M
read_rnd_buffer_size = 32M
myisam_sort_buffer_size = 8M
max_heap_table_size = 4G
tmp_table_size=1G;
log_slow_queries
long_query_time = 2
log_long_format
innodb_buffer_pool_size = 256M

If you use binary logs make sure that poller_output* tables are ignored and binlogs are stored not at memory disk.
5) Convert poller_output to ENGINE=MEMORY (text fields can be converted to varchar(32...255)), poller_output_boost ENGINE=MyISAM ROW_FORMAT=FIXED (ROW_FORMAT has to be set in SQL dump in CREATE TABLE sequence, so you will have to perform dump|restore procedure with editing SQL dump). Make sure you have converted poller_output_boost.output into varchar(32...255) or ROW_FORMAT will be silently ignored by MySQL.
6)Put backup script into cron with 2-3 hours interval.

Cacti & Spine
  • turn off all max_execution_time, increase memory_limit to 1G or so in all cacti's scripts (`grep -R' will help)
  • install spine in honor of cmd.php, let it use 1.2...1.4x$no_cpus threads. In my case it is 10 threads.
  • install plugin architecture
  • install boost plugin from archive in this thread. Ignoring this will produce unstable boosting and NaNs in graphs as a result. '1 hour' boost update interval is a good starting point.
  • apply patches patch1, patch2

Results:
Code:
11/03/2008 06:36:17 PM - SYSTEM STATS: Time:76.5106 Method:spine Processes:1 Threads:10 Hosts:535 HostsPerProcess:535 DataSources:222959 RRDsProcessed:0
11/03/2008 06:31:14 PM - SYSTEM STATS: Time:73.7684 Method:spine Processes:1 Threads:10 Hosts:535 HostsPerProcess:535 DataSources:222959 RRDsProcessed:0
11/03/2008 06:26:05 PM - SYSTEM STATS: Time:64.7133 Method:spine Processes:1 Threads:10 Hosts:535 HostsPerProcess:535 DataSources:222959 RRDsProcessed:0
11/03/2008 06:21:52 PM - SYSTEM BOOST STATS: Time:1229.8073 RRDUpdates:2673120
11/03/2008 06:21:13 PM - SYSTEM STATS: Time:69.5011 Method:spine Processes:1 Threads:10 Hosts:535 HostsPerProcess:535 DataSources:222959 RRDsProcessed:0
11/03/2008 06:16:15 PM - SYSTEM STATS: Time:72.1974 Method:spine Processes:1 Threads:10 Hosts:535 HostsPerProcess:535 DataSources:222959 RRDsProcessed:0

Mean memory usage is about 9Gb, peak memory usage is 14-15Gb


Last edited by BorisL on Thu Nov 20, 2008 2:52 am; edited 10 times in total
Back to top
gandalf
Developer


Joined: 02 Dec 2004
Posts: 12642
Location: Muenster, Germany

PostPosted: Mon Nov 03, 2008 4:48 pm    Post subject: Reply with quote

Wow, I'm impressed.
A few words on rrdtool? I would assume, you're using at least rrdtool 1.2.23 or up and a fadvise capable kernel?
Thanks for posting
Reinhard
Back to top
BorisL



Joined: 31 Mar 2007
Posts: 30

PostPosted: Tue Nov 04, 2008 6:39 am    Post subject: Reply with quote

>rrdtool
RRDtool 1.2.26 Copyright 1997-2007 by Tobias Oetiker <tobi@oetiker.ch>
Compiled Oct 8 2008 16:05:30

>uname -a
FreeBSD blah-blah 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #3: Thu Oct 9 15:30:19 MSD 2008 blah-blah:/usr/obj/usr/src/sys/SMP7 amd64
Back to top
BorisL



Joined: 31 Mar 2007
Posts: 30

PostPosted: Tue Nov 04, 2008 8:40 am    Post subject: Reply with quote

Here's my version of boost plugin.


boost.patched.tgz
 Description:
reupload due to poller_output_snap bug

Download
 Filename:  boost.patched.tgz
 Filesize:  29.36 KB
 Downloaded:  42 Time(s)



Last edited by BorisL on Wed Nov 19, 2008 1:27 am; edited 1 time in total
Back to top
BorisL



Joined: 31 Mar 2007
Posts: 30

PostPosted: Sat Nov 15, 2008 2:26 am    Post subject: Reply with quote

Found nasty performance of poller_output_boost when it is in InnoDB and output is in TEXT format: poller spent up to 70 seconds (!!) copying data from poller_output to poller_output_boost.

Proper fix:
Convert poller_output_boost to ENGINE=MyISAM ROW_FORMAT=FIXED.
ROW_FORMAT has to be set in SQL dump in CREATE TABLE sequence, so you will have to perform dump|restore procedure with editing SQL dump.
Make sure you have converted poller_output_boost.output into varchar(32...255) or ROW_FORMAT will be silently ignored by MySQL.

Code:

 11/15/2008 09:21:53 AM - SYSTEM STATS: Time:113.0797 Method:spine Processes:1 Threads:10 Hosts:545 HostsPerProcess:545 DataSources:227815 RRDsProcessed:0
11/15/2008 09:26:52 AM - SYSTEM STATS: Time:108.1915 Method:spine Processes:1 Threads:10 Hosts:545 HostsPerProcess:545 DataSources:227815 RRDsProcessed:0
11/15/2008 09:31:52 AM - SYSTEM STATS: Time:107.9162 Method:spine Processes:1 Threads:10 Hosts:545 HostsPerProcess:545 DataSources:227815 RRDsProcessed:0
11/15/2008 09:36:58 AM - SYSTEM STATS: Time:115.4306 Method:spine Processes:1 Threads:10 Hosts:545 HostsPerProcess:545 DataSources:227815 RRDsProcessed:0
11/15/2008 09:38:21 AM - SYSTEM BOOST STATS: Time:1281.4722 RRDUpdates:2733708
11/15/2008 09:41:37 AM - SYSTEM STATS: Time:95.7932 Method:spine Processes:1 Threads:10 Hosts:545 HostsPerProcess:545 DataSources:227815 RRDsProcessed:0
11/15/2008 09:47:13 AM - SYSTEM STATS: Time:132.1431 Method:spine Processes:1 Threads:10 Hosts:545 HostsPerProcess:545 DataSources:227815 RRDsProcessed:0
11/15/2008 09:51:59 AM - SYSTEM STATS: Time:119.0232 Method:spine Processes:1 Threads:10 Hosts:545 HostsPerProcess:545 DataSources:227815 RRDsProcessed:0
11/15/2008 09:57:03 AM - SYSTEM STATS: Time:122.6083 Method:spine Processes:1 Threads:10 Hosts:545 HostsPerProcess:545 DataSources:227815 RRDsProcessed:0
<<======== altering poller_output_boost format ===========>>
11/15/2008 10:01:10 AM - SYSTEM STATS: Time:69.6723 Method:spine Processes:1 Threads:10 Hosts:545 HostsPerProcess:545 DataSources:227815 RRDsProcessed:0
11/15/2008 10:13:53 AM - SYSTEM STATS: Time:67.4149 Method:spine Processes:1 Threads:10 Hosts:545 HostsPerProcess:545 DataSources:227815 RRDsProcessed:0
11/15/2008 10:21:04 AM - SYSTEM STATS: Time:63.3600 Method:spine Processes:1 Threads:10 Hosts:545 HostsPerProcess:545 DataSources:227815 RRDsProcessed:0
11/15/2008 10:26:12 AM - SYSTEM STATS: Time:71.8881 Method:spine Processes:1 Threads:10 Hosts:545 HostsPerProcess:545 DataSources:227815 RRDsProcessed:0
11/15/2008 10:31:20 AM - SYSTEM STATS: Time:76.0856 Method:spine Processes:1 Threads:10 Hosts:545 HostsPerProcess:545 DataSources:227815 RRDsProcessed:0
Back to top
BorisL



Joined: 31 Mar 2007
Posts: 30

PostPosted: Wed Nov 19, 2008 1:28 am    Post subject: Reply with quote

I have reuploaded boost.patched.tgz due to bug with poller_output_snap table.
Back to top
star3am



Joined: 04 Aug 2008
Posts: 5
Location: Cape Town

PostPosted: Wed Nov 19, 2008 8:29 am    Post subject: Nice Reply with quote

Nice one ! Thanks for the tips, I'm sure they will come in handy for many people
Back to top
koaps



Joined: 15 Feb 2007
Posts: 9

PostPosted: Mon Dec 08, 2008 2:01 pm    Post subject: Reply with quote

One thing we do in our large ass environment is try to use snmptable whenever possible.

This can speed things up greatly and put a lot less load on your polled devices.
Back to top
Display posts from previous:   
Post new topic   Reply to topic    Cacti Forum Index -> Informational/HOWTO's All times are GMT - 5 Hours
Page 1 of 1

 



Powered by phpBB © 2001, 2005 phpBB Group