Search
Follow Us

Follow nosqldatabases on Twitter Follow nosqldatabases on Facebook Follow nosqldatabases on Google Buzz Follow nosqldatabases on LinkedIn Follow nosqldatabases on FeedBurner NoSQL presentations on slideshare

Sponsors

Become a sponsor of NoSQLDatabases.com. Contact us to find out how.

Featured Jobs

 

Follow On Facebook
Recent NoSQL News

Advertisments

Entries in Amazon (5)

Tuesday
Jul132010

Introduction to Apache Cassandra

So in case you missed it, Cassandra has been in the news the last couple of days. So I thought this would be a good opportunity to provide an introduction to Cassandra via Gary Dusbabek from Rackspace. This presentation was actually given at Silicon Valley Cloud Computing Group back in June of this year.

Couple of key points about Cassandra (not from the presentation):

  • Initially created by Facebook for search functionality for users inbox mail on the site.
  • The source code was open sourced and released to the Apache Software Foundation.
  • Its design was inspired by both Google's BigTable and Amazon's Dynamo.
  • It's considered to be a column data store, similar to a Google BigTable or Apache HBase.

So why Cassandra at all? As Dusbabek mentions from his presentation "vertical scaling is hard". So as the amount of data we create and analyze increases, our strategies for dealing with that data change. Dusbabek walks us through a number of topics in his discussion including scaling, replication model, data model and practical considerations.

So without any further interruptions...

Wednesday
Jun092010

Eventual Consistency - Revisited

Werner Vogels, the CTO of Amazon, wrote a fascinating piece on eventual consistency. If you recall, consistency, is one the three properties of the CAP theorem. In his post Werner discusses an ideal world where there is only a single consistency model:

when an update is made all observers would see that update

Werner provides us with an historical perspective of how folks discovered that this was actually something that was pretty hard to do. Werner dives in more and discusses how consistency is viewed from the clients perspective and from the server perspective.

Werner, describes eventual consistency as:

a specific form of weak consistency; the storage system guarantees that if no new updates are made to the object, eventually all accesses will return the last updated value. If no failures occur, the maximum size of the inconsistency window can be determined based on factors such as communication delays, the load on the system, and the number of replicas involved in the replication scheme.

There are more details about various types of weak consistencies and a brief mention of Amazon's Dynamo key/value data store. It's a great read.

Eventually Consistent - Revisited

Saturday
May292010

The Origins of Cassandra

Cassandra roots can be traced back to Facebook. Specifically, it can be traced to an attempt to solve a problem within Facebook related to searching the Inbox of user's mail. Ultimately, Inbox Search was released in 2008 and served around 100 million users at the time of it's release. One can assume with Facebook approaching 500 million users, that the number of users has increased just a tad. Cassandra roots can be trace to two other proprietary data stores in Amazon's Dynamo and Google's BigTable, which both serve as influences in Cassandra's design and implementation.

This paper written by Avinash Lakshman and Prashant Malik of Facebook. They dive into the specifics of Cassandra including:

  • Influential Works 
  • Data Model 
  • Client API 
  • System Design 
  • Distributed Algorithms

Here what Lakshman and Malik had to say specifically about the system design:

The architecture of a storage system that needs to operate in a production setting is complex. In addition to the actual data persistence component, the system needs to have the following characteristics; scalable and robust solutions for load balancing, membership and failure detection, failure recovery, replica synchronization, overload handling, state transfer, concurrency and job scheduling, request marshalling, request routing, system monitoring and alarming, and confi guration management.

Read more: Cassandra - A Decentralized Structured Storage System

Related Reading:

Introduction to the Cassandra Data Model
Understanding the Cassandra Code Base
Installing and using Cassandra in just five steps

Saturday
May292010

Stumped why folks are turning to NoSQL

Joe Stump, CTO and founder of SimpleGEO, offers us a different perspective of why organizations such as his are using NoSQL. Most debates center on technical concerns such as scalability or performance requirements. However, in his post Joe focuses on a different side of the debate, operations. Joe does not debate that it's possible to get traditional RDBMS systems to perform at the same rate as NoSQL data stores, But rather points out the lengths in terms of manpower and cost that is required to do so. I think the questions that Joe asks in his post are just as interesting as the answers to those questions.

  • Do you honestly think that the PhDs at Google, Amazon, Twitter, Digg, and Facebook created Cassandra, BigTable, Dynamo, etc. when they could have just used a RDBMS instead?
  • How much are you spending on those MS SQL servers with SSD drives that serve up 6,100 results a second?
  • How much time are your DBAs spending administering your RDBMSs?
  • How much time are they in the data centers?
  • How much do those data centers cost?
  • How much do DBAs cost a year?
  • How easy is it to add a new server to your cluster?

Ultimately, Joe sums up his reasons to use NoSQL with the following quote:

I guess what I’m saying is that my decision to use NoSQL, and I’m guessing others’ decisions to do so, has less to do with the fact that we can’t squeeze a few thousand writes a second out of MySQL and more to do with management and cost overhead. NoSQL solutions allow us to serve absurd amounts of data for a really, really low price.

Read more: NoSQL vs. RDBMS: Let the flames begin!

Thursday
May132010

Dynamo: Amazon’s Highly Available Key-value Store

Amazon has published a whitepaper about it's highly available Key/Value store Dynamo. Dynamo you may recall has been pointed to as part inspiration for Apache's Cassandra data store.

The whitepaper details many important details such as requirements, design considerations, SLAs and system architecture.

Definitely worth a look.

Read more: Dynamo: Amazon’s Highly Available Key-value Store