New Cacti Architecture (0.8.8) - RFC Response Location

Anything that you think should be in Cacti.

Moderators: Moderators, Developers

Author
Message
User avatar
gandalf
Developer
Posts: 22375
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: New Cacti Architecture (0.8.8) - RFC Response Location

#61 Post by gandalf » Wed Oct 27, 2010 3:53 pm

AFAIK, no consensus has been encountered and no code was produced. I know nevertheless, that some small steps have been made, but to non-public code.
R.

cewood
Posts: 13
Joined: Thu Mar 25, 2010 11:09 pm
Location: Sydney, Australia

Re: New Cacti Architecture (0.8.8) - RFC Response Location

#62 Post by cewood » Wed Oct 27, 2010 4:12 pm

gandalf wrote:AFAIK, no consensus has been encountered and no code was produced. I know nevertheless, that some small steps have been made, but to non-public code.
R.
Thanks for the reply, I'll keep an eye out for this in future releases.


Cheers
Cameron.
--
"The future belongs to those who believe in the beauty of their dreams." - Eleanor Roosevelt

appodictic
Posts: 44
Joined: Thu Jul 10, 2008 4:46 pm

Re: New Cacti Architecture (0.8.8) - RFC Response Location

#63 Post by appodictic » Sat May 21, 2011 9:56 am

I have read over this thread. Since it has been started some amazing open source software can make this process much easier.

First let me mention apache-cassandra which is a distributed multi-master active/active database. While it will take some work to redesign the cacti mysql-schema into Cassandra. This is not impossible.

Next is the RRD store, there are already many examples of people using Cassandra for RRD-store like applications

http://nosql.mypopescu.com/post/3134031 ... ndra-based

However we can also just simply store RRD files raw inside Cassandra and pull them out where needed.

As for poller failover. I think that is best handled with a tool like linux-ha. Most people would only need one active-passive poller per datacenter. This is easily done with linux-ha.

I really want to see this happen some of the newer tools are making strides in distributed storage and really pushing cacti out of some use cases. cacti was the first open source tool I installed at my job. I love it to death. If we built it they will come. (back)

I do not want to step on anyone's toes with work that is already on going, so I would like to know if I can hack at it.

User avatar
gandalf
Developer
Posts: 22375
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: New Cacti Architecture (0.8.8) - RFC Response Location

#64 Post by gandalf » Sun May 22, 2011 10:27 am

Do you have any knowledge about performance when using Cassandra?
Could you provide some hints how to start such a project?
R.

appodictic
Posts: 44
Joined: Thu Jul 10, 2008 4:46 pm

Re: New Cacti Architecture (0.8.8) - RFC Response Location

#65 Post by appodictic » Sun May 22, 2011 3:08 pm

I have worked with the project extensively. Cassandra is a distributed /sharded key value store. Sharding is based on a user supplied key. Writes are done without reading so writes are really fast. Reads are fast as well because the underlying structures sort based on keys. It linear scalable and there are no SPOF.

As I mentioned, moving the RRD store to Cassandra is really trivial.
row key : server1/graph1
column key : 123452525 (timestamp)
column value 1234

column key : 123452530 (timestamp)
column value : 3434

Many people have used Cassandra to store this type of data (performance data)

For the meta store information in cacti. That is a bit more effort because you have to restructure your data (possibly denormalize and store it multiple times), we may not need as much horizontal scalability here multi-master mysql might fit the bill. But we sure could do that in Cassandra as well.

User avatar
gandalf
Developer
Posts: 22375
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: New Cacti Architecture (0.8.8) - RFC Response Location

#66 Post by gandalf » Sun May 22, 2011 3:25 pm

appodictic wrote:As I mentioned, moving the RRD store to Cassandra is really trivial.
row key : server1/graph1
column key : 123452525 (timestamp)
column value 1234

column key : 123452530 (timestamp)
column value : 3434
Well, but you know that rrdtool does not only store data?
- It does normalization
- it does consolidation (to avoid rising footprint infinitively)
- it does graphing; well, that's the most important issue, that I see currently with your proposal
R.

appodictic
Posts: 44
Joined: Thu Jul 10, 2008 4:46 pm

Re: New Cacti Architecture (0.8.8) - RFC Response Location

#67 Post by appodictic » Sun May 22, 2011 3:58 pm

Yes. Those are challenges. I am do not understand the RRD format in depth.

My thinking is that it will be much easier to use an already existing distributed data store and make RRD work with it, rather then trying to build a distributed RRDStore from the ground up.

Since RRDTool works with local files, I do not believe the final distributed cacti will use RRDTool as we know it. Is it a requirement to replace RRDTool with something that is completely transparent to cacti?

The other way to do at this very simply is that a column in Cassandra is just a byte []. We could serialize the entire RRDfile in a single column,or a list of sub columns) and just use Cassandra like a distributed file system. Possibly break an RRDfile across multiple cassandra columns, let RRDTool work locally and then intelligently detect the section changed and sync it to the distributed store.

jerrison
Cacti User
Posts: 55
Joined: Fri Dec 29, 2006 4:02 am

Re: New Cacti Architecture (0.8.8) - RFC Response Location

#68 Post by jerrison » Mon May 23, 2011 3:12 am

hi appo,
I´m very interested in a distributed cacti-system but i think one thing that is overlooked with cassandra: we have rrdtool-actions happening
every poller run (be it 1 minute or 5 minutes). This can be delayed a bit by boost-plugin (afaik) but still every rrd-file is touched+updated
with poller results.
Cacti was built "around" rrdtool that might come with inherent weaknesses but definetly a lot of strengths, too. I don´t know
any other database that can store data for trend-analysis as efficient and compact as rrd.
Do you have any data for I/O perfomance in cassandra and how it´ll work over ie WAN-Links? I´ll google cassandra up, but sure like to
get pointed to the right informations if u don´t mind :).

cheers,
jerri

edit:
Is this sth. i should be worried about: http://wiki.apache.org/cassandra/CassandraHardware ? Comparing to our ressources needed for Cacti atm
it´d be a rather hefty upgrade.

appodictic
Posts: 44
Joined: Thu Jul 10, 2008 4:46 pm

Re: New Cacti Architecture (0.8.8) - RFC Response Location

#69 Post by appodictic » Mon May 23, 2011 9:27 am

@jerrison. The CassandraHardware page is making recomendations on hardware for using Cassandra as a very large dedicated datastore. This page is describing what hardware you would need to run a large heavily accessed multi-gb or multi-tb cluster. Most would probably be able to get along with a single node instance configured with about the same memory as mysql (probably no promises :)

@Gandalf One nice thing about Cassandra is it has thrift bindings for many languages c, c++, php, etc I was actually thinking about making this work with the most limited upstream changes. I was thinking that we could fork rrdtool to rrdtool-cassandra. The path argument to the RRDTool commands would be used as the Cassandra key, for example and none of the rrdcommands would work with local files, they would interact directly with Cassandra. I am not a crack c coder by any stretch, but I know this is a BIG task, but the pay off is a drop in replacement for rrd that would be transparent to upstream cacti.

jerrison
Cacti User
Posts: 55
Joined: Fri Dec 29, 2006 4:02 am

Re: New Cacti Architecture (0.8.8) - RFC Response Location

#70 Post by jerrison » Wed May 25, 2011 1:40 am

One thing that turned out to be a bottleneck in quite a few cacti-setups is the I/O performance of read/writes-tasks when updating RRDs in larger environments (boost-plugin can help up to a point).
It´d be awesome if Cassandra could tackle that as well, somehow.
Just my 2c :).

appodictic
Posts: 44
Joined: Thu Jul 10, 2008 4:46 pm

Re: New Cacti Architecture (0.8.8) - RFC Response Location

#71 Post by appodictic » Wed May 25, 2011 8:16 am

That is one of the neat parts about the cassandra architectue it is a scale out architecture and p2p. So if n nodes can not handle the load adding nodes divides the data per node and the requests. And this can be done on he fly with no downtime.

beenpricked
Posts: 6
Joined: Tue Sep 20, 2011 5:35 pm

Re: New Cacti Architecture (0.8.8) - RFC Response Location

#72 Post by beenpricked » Fri Sep 23, 2011 5:18 pm

appodictic wrote:That is one of the neat parts about the cassandra architectue it is a scale out architecture and p2p. So if n nodes can not handle the load adding nodes divides the data per node and the requests. And this can be done on he fly with no downtime.
Funny you mention this because i have been having trouble with the new nodes trying to redirect back through the original N nodes. Wonder what i might be doing wrong. I am running pretty heavy Facebook Application Analytics, so that could have something to do with it. Does anybody have any good ideas?
Last edited by beenpricked on Sat Oct 01, 2011 7:14 pm, edited 1 time in total.

User avatar
rony
Developer/Forum Admin
Posts: 6016
Joined: Mon Nov 17, 2003 6:35 pm
Location: Michigan, USA
Contact:

Re: New Cacti Architecture (0.8.8) - RFC Response Location

#73 Post by rony » Sun Sep 25, 2011 11:50 am

Strange discussion to be having on this thread...

But, I have to add that Cassandra, while a pretty cool idea (I have researched it). Still doesn't provide the same interface and data storage that RRDtool does.

Until someone writes a replacement for RRDtool that uses Cassandra, I don't see this as a viable option for Cacti.
[size=117][i][b]Tony Roman[/b][/i][/size]
[size=84][i]Experience is what causes a person to make new mistakes instead of old ones.[/i][/size]
[size=84][i]There are only 3 way to complete a project: Good, Fast or Cheap, pick two.[/i][/size]
[size=84][i]With age comes wisdom, what you choose to do with it determines whether or not you are wise.[/i][/size]

User avatar
Howie
Cacti Guru User
Posts: 5330
Joined: Thu Sep 16, 2004 5:53 am
Location: United Kingdom
Contact:

Re: New Cacti Architecture (0.8.8) - RFC Response Location

#74 Post by Howie » Thu Apr 11, 2013 10:43 am

On the other side of things, with new tools, I've been wondering for a while about using some kind of message queue system between spine and cacti, both for poller_tasks and for the results - you'd get natural load-balancing across multiple pollers using the work queue, and you can either immediately consume the results or batch them up like Boost for results. You could also have other backends to spit the data out into things like Carbon/Graphite or other trendy tools.

(I found this thread again while checking to see if it had moved forwards - I've spent the last couple of days playing with Nimsoft, which has a nice distributed poller architecture (with optional SSL VPNs between pollers to get into customer networks and across NATs), central configuration, but horrible UI)

If Cacti could distribute polling the same way, if Autom8 could apply Thold templates, and if Thold could have multiple templates per DS, I think I'd just stop looking :-)
Weathermap 0.98 is out! & QuickTree 1.0. Superlinks is over there now (and built-in to Cacti 1.x).
Some Other Cacti tweaks, including strip-graphs, icons and snmp/netflow stuff.
(Let me know if you have UK DevOps or Network Ops opportunities, too!)

User avatar
gandalf
Developer
Posts: 22375
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: New Cacti Architecture (0.8.8) - RFC Response Location

#75 Post by gandalf » Sat Apr 27, 2013 4:33 am

Howie wrote:On the other side of things, with new tools, I've been wondering for a while about using some kind of message queue system between spine and cacti, both for poller_tasks and for the results - you'd get natural load-balancing across multiple pollers using the work queue, and you can either immediately consume the results or batch them up like Boost for results. You could also have other backends to spit the data out into things like Carbon/Graphite or other trendy tools.
Do you see any chance to both have
- for a straight-forward approach: an easy, one system setup featuring as few components as possible for the very beginning
- for a complex, distributed approach: a scaling setup, where you can add components (message queues, pollers, plugins and stuff) as need arises

I'm not against heading for more complex use cases. But I don't like to make things unnecessary complex for a simple use case.

Result: when adding a queueing system, this should be an option only, not a must. And it should be able to cope with huge bulk peaks of messages ...
R.

Post Reply