|
|
| Author |
Message |
rcaston Cacti User
Joined: 06 Jan 2004 Posts: 202 Location: US-Dallas, TX
|
Posted: Thu Apr 05, 2007 11:17 am Post subject: Distributed Cacti - Ideas |
|
|
I'd like to start a post to discuss how we could handle using cacti to poll extremely large enterprise environments which would require more performance from a Cacti system than a single server can provide.
My thoughts are the problem could be addressed by proceeding with one of the following two approaches:
1) Distributing out the data collection: Setting it up so a number of specific devices are polled from remote pollers that do not run on the cacti server itself. Since some of the load of the poller is the initial mysql calls; I would imagine moving to a mysql read-tier (multiple mysql replicated servers) for the many distributed pollers to hit would be mandatory here.
or
2) Integrate the data presentation: Either thru a re-write of cacti or potentially via a plugin. My thoughts here would be a simple plugin which could load up the url to the other Cacti box when clicked. Currently this could be done using a modification of the "ntop" plugin example to call a url for exported cacti data; however - the data presentation of the exported data would include some of the table images (wrappers around the table data, images) etc, which would not look right. Also; for this to work, the current bugs in graph_export.php would need to be fixed.
I'd like to invite people to share your thoughts and or experiences with this issue. Hopefully this post goes sticky.
Last edited by rcaston on Mon Apr 16, 2007 3:59 pm; edited 2 times in total |
|
| Back to top |
|
 |
gandalf Developer
Joined: 02 Dec 2004 Posts: 17009 Location: Muenster, Germany
|
Posted: Sat Apr 07, 2007 2:08 pm Post subject: Re: Distributed Cacti Project |
|
|
| rcaston wrote: | I'd like to start a post to discuss how we could handle using cacti to poll extremely large enterprise environments which would require more performance from a Cacti system than a single server can provide.
My thoughts are the problem could be addressed by proceeding with one of the following two approaches:
1) Distributing out the data collection: Setting it up so certain devices are polled from a remote poller that does not run on the cacti server itself. Since some of the load of the poller is the initial mysql calls; I would imagine moving to a mysql read-tier (multiple mysql replicated servers) for the distributed pollers to hit would be mandatory here. | This does not yet solve the issue with "rrdtool update". Where do you suppose the rrd files to live?
If "distributed": you will get an issue with graphing based on distributed rrd files
Centralized, nfs mount: in my opinion a performance bottleneck. I'm aware of at least one very big installation that uses this method currently with success. But I'm quite sure it _will_ become a bottleneck
Centralized, using "rrdtool server": Nice one, but perhaps it will require some more features as "security". I'm quite sure, that this will work "only" with bulk updates (see: boost plugin). I bet it will be too slow for just-in-time updates.
Reinhard |
|
| Back to top |
|
 |
rcaston Cacti User
Joined: 06 Jan 2004 Posts: 202 Location: US-Dallas, TX
|
Posted: Mon Apr 09, 2007 9:28 am Post subject: |
|
|
Yes, there would be an I/O bottleneck as the RRDs would end up existing on different systems; but given the idea that this is for the enterprise; I don't see that being an issue for those who need it. ie: if we have to we'll stick this on a high end NAS.
...
further thoughts on the "Distributed Collector" idea would be that a new field would be added to each device in the mysql database which would specify association between the device and a "poller template"
the 'poller template' would have the necessary information associated with it to determine which of several pollers will query and poll for that object.
example: 2 pollers -
Each would query his local replicated copy of the mysql cacti database for the list of objects which match his pollerid field. (ie: give me a list of stuff i am supposed to poll and collect rrds for)
this would keep the rrds and their pollers(various cactids) on different boxes.
---
all this being said; it would still likely be easier to go with idea #2 - which is to integrate the data presentation / web interface of several stand alone cacti boxes. |
|
| Back to top |
|
 |
rony Developer/Forum Admin
Joined: 17 Nov 2003 Posts: 5694 Location: Michigan, USA
|
Posted: Mon Apr 09, 2007 9:59 am Post subject: |
|
|
FYI, distributed polling doesn't mean that the polling machine needs access to the rrdtool files.
In the model proposed for 0.9.0, the poller is distributable, but the rrdtool updates occur on a central machine. |
|
| Back to top |
|
 |
rcaston Cacti User
Joined: 06 Jan 2004 Posts: 202 Location: US-Dallas, TX
|
Posted: Mon Apr 09, 2007 12:03 pm Post subject: |
|
|
| rony wrote: | FYI, distributed polling doesn't mean that the polling machine needs access to the rrdtool files.
In the model proposed for 0.9.0, the poller is distributable, but the rrdtool updates occur on a central machine. |
true, thats another way of doing it; advantages and disadvantages to each method ...
if the rrd data collection remains local; you don't need to worry about network connectivity between pollers.
But if the rrd files and the poller are distributed across different datacenters/locations, you'd need to include some sort of local store and delayed update of the rrd files in the event of a network outage between poller and rrd repository. |
|
| Back to top |
|
 |
rony Developer/Forum Admin
Joined: 17 Nov 2003 Posts: 5694 Location: Michigan, USA
|
Posted: Mon Apr 09, 2007 1:13 pm Post subject: |
|
|
With rrdtool files in a central store and boost plugin functionality being included in 0.9.0, I think it will resolve most of these problems.
There are many ways to skin this cat, but I believe that we can't rely on a distrubuted storage system. So, centralized rrdtool updates based upon access (boost), will vastly improve the polling system.
I should also note that the distributed polling system is being designed to be self healing.
For more information, we should really get Larry (TheWitness) involved in this post. |
|
| Back to top |
|
 |
rcaston Cacti User
Joined: 06 Jan 2004 Posts: 202 Location: US-Dallas, TX
|
Posted: Mon Apr 09, 2007 3:36 pm Post subject: |
|
|
| rony wrote: | | For more information, we should really get Larry (TheWitness) involved in this post. |
I'd like to learn more about the features being worked on in 0.9.0 |
|
| Back to top |
|
 |
wjm
Joined: 13 Oct 2006 Posts: 12
|
Posted: Fri Apr 13, 2007 9:33 am Post subject: |
|
|
I am very interested in a distributed model.
To change cacti from a tool my team uses for mostly routers and the dozen or so servers I care about, to a tool I can share with other teams would explode the amount of devices that could be polled. |
|
| Back to top |
|
 |
rcaston Cacti User
Joined: 06 Jan 2004 Posts: 202 Location: US-Dallas, TX
|
Posted: Fri Apr 13, 2007 10:44 am Post subject: yeah... |
|
|
The problem I am facing with using cacti as a true large enterprise solution requires having multiple distributed pollers.
Any solution which results with only a single poller server can not scale well.
imagine trying to manage a pair of routers at about 30 physical sites with about 3,000 interfaces each.
one poller will not handle (2 x 30 x 3000 = 180,000 interfaces)
assuming you are only doing traffic in/out; that would be (2 x 180,000 = 360,000) RRA files. As nice as cactid is; has anyone ever had it to poll 360,000 elements in a 5 minute cycle?
or... Anyone ever tried to open a directory with 360,000 files in it? 
Last edited by rcaston on Fri Apr 13, 2007 11:19 am; edited 1 time in total |
|
| Back to top |
|
 |
Howie Cacti Guru User
Joined: 16 Sep 2004 Posts: 3772 Location: United Kingdom
|
Posted: Fri Apr 13, 2007 11:05 am Post subject: Re: yeah... |
|
|
| rcaston wrote: | | assuming you are only doing traffic in/out; that would be (2 x 180,000 = 360,000) RRA files. As nice as cactid is; has anyone ever had it to 360,000 elements in a 5 minute cycle? |
Small point - it's "only" 180,000 RRAs since the in/out go into the same file. Still a heck of a lot though  |
|
| Back to top |
|
 |
rcaston Cacti User
Joined: 06 Jan 2004 Posts: 202 Location: US-Dallas, TX
|
Posted: Fri Apr 13, 2007 11:17 am Post subject: Re: yeah... |
|
|
| Howie wrote: | | rcaston wrote: | | assuming you are only doing traffic in/out; that would be (2 x 180,000 = 360,000) RRA files. As nice as cactid is; has anyone ever had it to 360,000 elements in a 5 minute cycle? |
Small point - it's "only" 180,000 RRAs since the in/out go into the same file. Still a heck of a lot though  |
hah, meant to add in traffic + errors ... ..
But yes, if the rra's and pollers were distributed; we could scale sideways until the Web or MySQL server becomes the bottleneck, at which point a potential solution could be using load balanced web servers with Replicated MySQL servers. |
|
| Back to top |
|
 |
gandalf Developer
Joined: 02 Dec 2004 Posts: 17009 Location: Muenster, Germany
|
Posted: Mon Apr 16, 2007 5:59 am Post subject: Re: yeah... |
|
|
| rcaston wrote: |
one poller will not handle (2 x 30 x 3000 = 180,000 interfaces)
| I suppose that this amount will be handled by cacti/cactid using the boost plugin. Nevertheless, a distributed thingy would be a better solution.
Reinhard |
|
| Back to top |
|
 |
TheWitness Developer
Joined: 14 May 2002 Posts: 13135 Location: MI, USA
|
Posted: Mon Apr 16, 2007 9:52 pm Post subject: |
|
|
Boost 1.x should scale to installation with approximately 200k data sources (directory access issue aside). My plans for boost 2.0 include a method to perform the following:
0) Associate a host with a poller.
1) Provide poller based directory structures
2) Create host subdirectories (aka each hosts RRD's are in a single direcory)
3) Ability to have RRD updates occur in a distributed fashion using boost servers (distributed as well).
I may pull 0 and 3 out till 0.9. It all depends on timing. In the mean time, that will resolve the directory access issues.
The real issue as I see it is that there will always be a site that is just to big for Cacti. Well, that applies for every other tool as well.
With Boost 1.2, you can however, do the following:
1) Primary Database Server with Either MYISAM or MEMORY based Boost
2) Primary Poller with Local Storage (Lot's of it) and Boost Server
3) Web Farm with one to many Web Servers using a smaller version of Cacti with sym links to the RRA and Boost Cache folders.
4) Load Ballancer in front of the Web Farm.
With this configuration, you will have an enterprise system. For most enterprises anyway.
TheWitness |
|
| Back to top |
|
 |
rcaston Cacti User
Joined: 06 Jan 2004 Posts: 202 Location: US-Dallas, TX
|
Posted: Tue Apr 17, 2007 9:46 am Post subject: |
|
|
| TheWitness wrote: |
The real issue as I see it is that there will always be a site that is just to big for Cacti. Well, that applies for every other tool as well.
|
Well, there is truth to that; I would say that if you do manage to implement the feature roadmap you've outlined above with boost; it will go a long way to increasing cacti's ability to manage large environments.
In fact; if you do get the ability to associate a poller machine with a target host; you'd bring cacti into the league of most commercial solutions which would really make cacti shine.
Considering that even a 'budget' commercial poller is going to retail for around $45,000 U.S. per server/software setup. And that is an actual quote for a poller appliance (running linux) which only handles about 30,000 devices.
If Cacti could scale sideways; it would attract a great deal more attention from the business community.
My only concern is, while I agree with and would support the rrd's being broken out to subdirs. it may potentionally create a small migration hurdle if you try to move from a non-boost to a boost setup, and then back again; since the RRA's will need to be moved back and forth. However, I believe someone posted that boost may end up as a standard part of cacti 0.9.x, if so - this point is moot.
All that being said; I really like where boost is going. |
|
| Back to top |
|
 |
gandalf Developer
Joined: 02 Dec 2004 Posts: 17009 Location: Muenster, Germany
|
Posted: Tue Apr 17, 2007 2:08 pm Post subject: |
|
|
| TheWitness wrote: | Boost 1.x should scale to installation with approximately 200k data sources (directory access issue aside). My plans for boost 2.0 include a method to perform the following:
0) Associate a host with a poller. | It would love this. IMHO, this feature would allow for different interesting structures:
a) Associate a poller with a "site" (location) to "localize" cacti polling traffic by assigning all hosts of this site to that very poller
b) Support for multiple polling intervals (in this case, all polling intervals of this host would be the sam, but I suppose this is not a big issue)
| Quote: | | 1) Provide poller based directory structures | ? Please elaborate. I do not get the point why this would be useful, sorry
| Quote: | | 2) Create host subdirectories (aka each hosts RRD's are in a single direcory) | Yep, yep. IMHO that's a must even for mid-sized installations. I did not make any tests on directory search impact using caczi, but some months ago Tobi Oetiker published a testing script that shows some impact.
| Quote: | | 3) Ability to have RRD updates occur in a distributed fashion using boost servers (distributed as well). | But all rrd files still in the "same" location (different subdirectories, but same file system)?
just my 2 cents
Reinhard |
|
| Back to top |
|
 |
|