Monday
Jun212010
On Rails with Apache Cassandra
Monday, June 21, 2010 at 6:00AM |
Derek Stainer In the following presentation Stu Hood, Technical Lead at Rackspace, discusses Cassandra and Ruby. Some of the more interesting points from the discussion are the reasons provided by Hood to use a solution like Cassandra.
- Large dataset, specifically, dataset larger than one node can handle
- Volitile dataset with write percentages greater than 25%
- Expensive, to quote Stu "More than you can afford with a commercial solution"
Another point that is interesting, which is known but hasn't really been discussed in detail is Cassandra's lineage. It's widely known that Cassandra is a blend of Amazon's Dynamo and Google's BigTable solutions. Stu discusses what Cassandra has pulled from each data store.
Amazon's Dynamo:
- No node in the cluster is special
- Gossip technique
- Eventual consistency
Google's BigTable:
- "Column Family" Data Model
- Range queries for rows
- Scans rows in order
- Memtable/SSTable Structure
- Always write sequentially to disk
- Bloom filters to minimize random reads
- Trounces B-Tree for big data
- Linear insert performance
- Log growth for reads
Lots of other goodies in the presentation.

