|
|
| Author |
Message |
TheWitness Developer
Joined: 14 May 2002 Posts: 9255 Location: MI, USA
|
Posted: Sun Dec 16, 2007 2:01 pm Post subject: New Cacti Architecture (0.8.8) - RFC Resonse Location |
|
|
All,
Please submit your RFC comments here. Thanks for your participation. I will attach newer versions of the RFC and provide feedback in this post.
Regards,
TheWitness
| Description: |
|
 Download |
| Filename: |
Cacti Multiple Poller Design v1.0.pdf |
| Filesize: |
338.71 KB |
| Downloaded: |
510 Time(s) |
Last edited by TheWitness on Mon Dec 17, 2007 8:45 am; edited 1 time in total |
|
| Back to top |
|
 |
gandalf Developer
Joined: 02 Dec 2004 Posts: 11223 Location: Muenster, Germany
|
Posted: Sun Dec 16, 2007 3:11 pm Post subject: |
|
|
Hi Larry,
thank you for opening this long overdue discussion. Beneath "automation" (yes, I know ...) was this the most deeply discussed topic of 2.CCC.eu.
But personally, I have a problem with this picture. It shows very nice where everything's happening from the data point of view. (BTB: I suppose, it would be better to either use poller_item _or_ poller_cache in picture and text. Personally, I prefer poller_item as this is the name of the real table.)
But I am not sure to fully understand where the application logic lies, in other words: what about the "workflow"?
I suppose. that each Poller Group is governed by a local crontab (or a "real" daemon) that fetches data from the db server (either directly or via table replication). Output is stored in local poller_output table and replicated to the db server? What would be a criteria to associate a host/data source to some specific Poller Group? I suppose, that clock synchronization (ntp?) would be required, as the local time used for poller_output would be used for rrdtool update. What about timezones? Where should hooks like "poller_bottom" be executed?
And another part of the logic would lie with the http servers, for sure (console aka administration). But where lies the rrdtool update logic? With the database server? Or with the RRDTool servers? Would RRDfile Update Groups equal Poller Groups? If not, what would be the criteria this time?
The rrdfile data pipeline surely would be used for graphing. But a lot of plugins currently require access to rrd files as well. So there would be more than graphing only. When running from load-balanced http instances, either graph caching will fail or cache must reside on RRDfile Servers. As there are two different pipes to RRDfile Servers, perhaps synchronization between updates and graph (rrdtool fetch) is necessary.
That's my first thoughts. Surely more will follow
Reinhard
|
|
| Back to top |
|
 |
TheWitness Developer
Joined: 14 May 2002 Posts: 9255 Location: MI, USA
|
Posted: Sun Dec 16, 2007 3:54 pm Post subject: |
|
|
| gandalf wrote: | | thank you for opening this long overdue discussion. Beneath "automation" (yes, I know ...) was this the most deeply discussed topic of 2.CCC.eu. |
Yes, long overdue.
| gandalf wrote: | | But personally, I have a problem with this picture. |
RFC's often times start that way.[/quote]
| gandalf wrote: | | It shows very nice where everything's happening from the data point of view. (BTB: I suppose, it would be better to either use poller_item _or_ poller_cache in picture and text. Personally, I prefer poller_item as this is the name of the real table.) |
I litterally "slapped" this together. I will correct that in v2.
| gandalf wrote: | | But I am not sure to fully understand where the application logic lies, in other words: what about the "workflow"? |
Yes, after sending it out, I realized I left that out. Basically, the poller, will, by default use the main servers poller_item table, for it's list of poller items. If for some reason, the main server is not reachable, it will use it's local copy and store the poller_output table locally.
The same is intended for the poller_output table. Central server first. The remote pollers will be provided instructions to "update/synchronize" their local poller_items table periodically (aka when things change). Those synchronizations would not happen any more often that every 5 minutes.
If the central server is not available, then each poller will cache the updates in their poller_output tables until such time as the remote connection is available, then it will dump sequntially, by date to the central server. By doing so, no data will be lost.
| gandalf wrote: | | What would be a criteria to associate a host/data source to some specific Poller Group? |
There would have to be modifications to the poller_output table, or another table to keep track of when it is time to poll things. That is more TBD, until I have more feedback.
| gandalf wrote: | | I suppose, that clock synchronization (ntp?) would be required, as the local time used for poller_output would be used for rrdtool update. |
Of course...
| gandalf wrote: | | What about timezones? |
You tell me...
| gandalf wrote: | | Where should hooks like "poller_bottom" be executed? |
I need feedback from people like Howie and yourself to determine "where" that should be. So, see my comment above. It's an RFC you know
| gandalf wrote: | | And another part of the logic would lie with the http servers, for sure (console aka administration). But where lies the rrdtool update logic? With the database server? Or with the RRDTool servers? |
The RRDfile Services will be asynchronous and running as daemons. They will process all items in the poller_output table as they come in and handle other requests in other threads. The main poller_output table, with some minor modifications, will be used to achieve RRDupdates.
| gandalf wrote: | | Would RRDfile Update Groups equal Poller Groups? If not, what would be the criteria this time? |
No.
| gandalf wrote: | | The rrdfile data pipeline surely would be used for graphing. But a lot of plugins currently require access to rrd files as well. |
Need feedback from Plugin developers as to "how" they would like this to work.
| gandalf wrote: | | So there would be more than graphing only. When running from load-balanced http instances, either graph caching will fail or cache must reside on RRDfile Servers. |
I expect the RRDtool Services to handle this. Each graph will know, in advance, which server it needs to talk to.
Regards,
TheWitness
|
|
| Back to top |
|
 |
gandalf Developer
Joined: 02 Dec 2004 Posts: 11223 Location: Muenster, Germany
|
Posted: Sun Dec 16, 2007 4:05 pm Post subject: |
|
|
| TheWitness wrote: | | gandalf wrote: | | It shows very nice where everything's happening from the data point of view. (BTB: I suppose, it would be better to either use poller_item _or_ poller_cache in picture and text. Personally, I prefer poller_item as this is the name of the real table.) |
I litterally "slapped" this together. I will correct that in v2. | I suppose, separating data logic and workflow is the better way. Else I fear that the picture will become too crowdy.
From the current design, the database servers seem to define the limit of this architecture. While Poller and RRDfile Servers are scalable as well as http, database server is existing only once. As I understand, the second one is for failover only.
So, if there's a central poller_item table as well as poller_output, their update/delete performance will be cruical. I suppose, you're thinking of memory tables as boost uses them. And then, like boost does with rrdtool bulk update, there's the SQL bulk insert that will create some more preformance, correct?
Reinhard
|
|
| Back to top |
|
 |
TheWitness Developer
Joined: 14 May 2002 Posts: 9255 Location: MI, USA
|
Posted: Sun Dec 16, 2007 4:31 pm Post subject: |
|
|
Yes, memory tables have a I/O rate in excess of 40k updates per second, so even though it uses a table lock mechanism, we are safe. I was thinking that making this a separate database altogether though would help other subsystems performance though, and simplify backup.
What do you think about that?
TheWitness
|
|
| Back to top |
|
 |
melchandra Cacti User
Joined: 29 Jun 2004 Posts: 312 Location: Indiana
|
Posted: Sun Dec 16, 2007 4:33 pm Post subject: |
|
|
Forgive me if this seems to be a silly question.
Is there some reason why the pollers don't directly update the RRD Files? Does it have to go through the database?
Is there a way for the Pollers to be sent both the host and oid information to query, as well as host information for the Remote RRD Update Service so they could query, and then pass the data directly to the RRD storage devices?
|
|
| Back to top |
|
 |
Howie Cacti Guru User
Joined: 16 Sep 2004 Posts: 1958 Location: United Kingdom
|
Posted: Sun Dec 16, 2007 4:33 pm Post subject: |
|
|
| gandalf wrote: | | The rrdfile data pipeline surely would be used for graphing. But a lot of plugins currently require access to rrd files as well. So there would be more than graphing only. When running from load-balanced http instances, either graph caching will fail or cache must reside on RRDfile Servers. As there are two different pipes to RRDfile Servers, perhaps synchronization between updates and graph (rrdtool fetch) is necessary. |
Once updates have been made scalable, is there a requirement for actual load-balancing of HTTP frontends? I can see why you might want HA, but does anyone really have so many concurrent users that a single server can't cope? All I ever see are queries about the polling limitations...
With a HA setup instead, then cache-sharing isn't really necessary. Really, if you have 1000 users all hitting the same graph then you are still reducing the number of graph-drawing operations from 1000 to 2, which is probably enough. If they aren't all hitting the same graphs, then graph caching isn't going to help anyway.
(I don't hit any of these limitations with my own modest needs, so I'm just curious really. Despite what it says under my name on the left, I'm just a lowly user that talks a lot )
|
|
| Back to top |
|
 |
TheWitness Developer
Joined: 14 May 2002 Posts: 9255 Location: MI, USA
|
Posted: Sun Dec 16, 2007 4:40 pm Post subject: |
|
|
| Howie wrote: | | With a HA setup instead, then cache-sharing isn't really necessary. Really, if you have 1000 users all hitting the same graph then you are still reducing the number of graph-drawing operations from 1000 to 2, which is probably enough. If they aren't all hitting the same graphs, then graph caching isn't going to help anyway. |
This sort of answers the "should we invent it outselves" question on load ballancing. HA for sure, but there are already technologies for that. So, I would suspect your answer would be leave that out of the design. The capability will be there for HA, but it's always optional.
TheWitness
|
|
| Back to top |
|
 |
TheWitness Developer
Joined: 14 May 2002 Posts: 9255 Location: MI, USA
|
Posted: Sun Dec 16, 2007 4:41 pm Post subject: |
|
|
| melchandra wrote: | Forgive me if this seems to be a silly question.
Is there some reason why the pollers don't directly update the RRD Files? Does it have to go through the database?
Is there a way for the Pollers to be sent both the host and oid information to query, as well as host information for the Remote RRD Update Service so they could query, and then pass the data directly to the RRD storage devices? |
Yes, absolutely. When you have lot's of them, the disk i/o required is astounding. So, by batching them you can reduce I/O wait by 80-90% over time. So, the database provides that for us.
TheWitness
|
|
| Back to top |
|
 |
ben_c Cacti User
Joined: 14 May 2007 Posts: 177 Location: Melbourne, Australia.
|
Posted: Sun Dec 16, 2007 4:47 pm Post subject: |
|
|
Great document guys, need a little bit more time to analyze it.
But it is the direction Cacti needs to start heading. I know the current limitations all too well from using it in a large enterprise (50,000+ data sources).
|
|
| Back to top |
|
 |
Howie Cacti Guru User
Joined: 16 Sep 2004 Posts: 1958 Location: United Kingdom
|
Posted: Sun Dec 16, 2007 4:49 pm Post subject: |
|
|
| TheWitness wrote: | | This sort of answers the "should we invent it outselves" question on load ballancing. HA for sure, but there are already technologies for that. So, I would suspect your answer would be leave that out of the design. The capability will be there for HA, but it's always optional. |
Indeed. I'd say stick to the architectural stuff required to support it (or at least not break it ). HA solutions are usually either platform-specific (CARP, MS NLB, ultramonkey) or external (CSS, Alteon etc) anyway.
|
|
| Back to top |
|
 |
ben_c Cacti User
Joined: 14 May 2007 Posts: 177 Location: Melbourne, Australia.
|
Posted: Sun Dec 16, 2007 4:49 pm Post subject: |
|
|
| TheWitness wrote: | | Howie wrote: | | With a HA setup instead, then cache-sharing isn't really necessary. Really, if you have 1000 users all hitting the same graph then you are still reducing the number of graph-drawing operations from 1000 to 2, which is probably enough. If they aren't all hitting the same graphs, then graph caching isn't going to help anyway. |
This sort of answers the "should we invent it outselves" question on load ballancing. HA for sure, but there are already technologies for that. So, I would suspect your answer would be leave that out of the design. The capability will be there for HA, but it's always optional.
TheWitness |
I tend to agree, built in application level load balancing is never an ideal situation in my opinion. If you really need it, by some hardware to do the job properly (F5, alteon etc).
|
|
| Back to top |
|
 |
Howie Cacti Guru User
Joined: 16 Sep 2004 Posts: 1958 Location: United Kingdom
|
Posted: Sun Dec 16, 2007 4:56 pm Post subject: |
|
|
| TheWitness wrote: |
| gandalf wrote: | | Where should hooks like "poller_bottom" be executed? |
I need feedback from people like Howie and yourself to determine "where" that should be. So, see my comment above. It's an RFC you know
|
If I understand the layout correctly, it would have to be on the 'master' poller, as the remote pollers report back in to that one. Of the 'big' plugins that I can think of that actually use cacti data * (reportit, thold, weathermap), thold and wmap can both already use poller_output instead of looking at rrd files directly. I don't really see any way around it for reportit though... if the rrdfiles were physically distributed onto different machines, then there would need to be some sort of aggregate-view or 'run this on all the rrd servers' API.
Actually thold probably would work either on the local pollers or the central location, since it works with one DS at a time, but it would be easier to maintain in the centre.
* I don't use Manage, MACTrack or Discovery, but as far as I know they don't deal with rrd data, do they?
|
|
| Back to top |
|
 |
Alice Cacti User
Joined: 28 Oct 2003 Posts: 88 Location: Bucharest, RO.
|
Posted: Sun Dec 16, 2007 8:21 pm Post subject: |
|
|
It's a little late, even for me, so please excuse all the aberations I'm about to write
WHY do we want multiple pollers?
a) Backup - have all the data available, even if one of the Servers is not available, or can't request data from all
the devices for some reason (network outage)
b) Speed - One server can't handle all the required devices
c) Combined (a+b)
IMHO it's a different approach for every one of the three situations listed.
I THINK that speed can be solved "local" (4 separate servers: Web, poller, DB and RRD Updater)
Facts: 10676 DS, 9742 graphs, 10669 RRD files totalling 1.3GB
Polling time: ~15 seconds
Two servers, one DB, one for the rest
Backup, on the other hand, is something else.
What if we run 2 somehow completly "independent" cacti instances, both querying the same hosts, and somehow
syncronize them?
Something like replication for MySQL (I really don't know how this works) and RSNYC for RRD files?
Hmm, nice shit
| Quote: |
Error in posting
DEBUG MODE
SQL Error : 1064 You have an error in your SQL syntax; check the manual that corresponds to your MySQL server
version for the right syntax to use near 'WHERE forum_id = 7' at line 3
UPDATE forums SET forum_posts = forum_posts + 1, forum_last_post_id = WHERE forum_id = 7
Line : 423
File : functions_post.php
|
|
|
| Back to top |
|
 |
TheWitness Developer
Joined: 14 May 2002 Posts: 9255 Location: MI, USA
|
Posted: Sun Dec 16, 2007 10:37 pm Post subject: |
|
|
Yea, you get that when you sit on a post too long. Back button, copy, back button, repost, paste, post.
TheWitness
|
|
| Back to top |
|
 |
|
Powered by phpBB © 2001, 2005 phpBB Group
|
|