Search
Follow Us

Follow nosqldatabases on Twitter Follow nosqldatabases on Facebook Follow nosqldatabases on Google Buzz Follow nosqldatabases on LinkedIn Follow nosqldatabases on FeedBurner NoSQL presentations on slideshare

Sponsors

Become a sponsor of NoSQLDatabases.com. Contact us to find out how.

Featured Jobs

 

Follow On Facebook
Recent NoSQL News

Advertisments

Entries in Column Store (2)

Monday
Aug162010

The Importance of Idempotency for Cassandra Updates

An idempotent process, is a process which no matter how many times repeated the results are the same. So what does this have to do with Cassandra? Maxim Grinev is going to tell us. Specifically he points out:

When you develop application for Cassandra you should be aware of the following fact. Even when client observes the failure of an update it is still possible that this update has been executed successfully. The cause of such anomalous behavior is that Cassandra does not support transactional rollback.

This "lack of support" for transactional rollback is actually by design, mainly because rollback of distributed transactions is both expensive and hard to scale. Therefore, as Maxim points out our applications must deal with these intricacies.

So how do you design a data model to account for the issues described above? Well Maxim describes a rather clever way of designing your application:

Instead of just storing the mapping of URLs to counters (i.e. column family URL_statistics where each record has an URL as a key and a single column having counter as its value) a solution can be to store the mapping of each URL to the IDs of the tweets which contains the URL (i.e. column family URL_Tweets where each record has an URL as a key, columns representing tweets, column names are the tweet IDs, and column values are not used). URL counters will then be computed on retrieval by counting tweet IDs.

It is a good idea to store tweet IDs as column names so that Cassandra automatically eliminates duplicates – repeated update will be ignored in this case (use this great Cassandra feature to make your updates idempotent!).

So how early should you design this into your application's data model?

It is common that Cassandra applications are not initially designed for idempotence. At first, small scale deployments do not exhibit these subtle problems and work fine. Only as time passes and their deployments expand the problems manifest and the applications respond to handle them. Do it right from the beginning.

Read more: Update Idempotency: Why It is Important in Cassandra Applications

Tuesday
Jun152010

High Performance Scalable Data Stores

Rick Catell has written a paper that I think is a very good comparison of the various NoSQL data stores. In it he discusses the various types of data stores available, i.e. key-value, document and column. For each data store he provides a description of the various implementations available, such as, Voldemort, Redis and Scalaris.

Data stores discussed in the paper:

Key/Value: Voldemort, Riak, Redis, Scalaris, Tokyo Tyrant and Enhanced Memcached
Document: SimpleDB,  CouchDB and MongoDB
Column: HBase, HyperTable and Cassandra

At a minimum the paper allows the reader to get a basic understanding of each data store.

Read more: High Performance Scalable Data Stores