The Importance of Idempotency for Cassandra Updates
Monday, August 16, 2010 at 6:01AM |
Derek Stainer An idempotent process, is a process which no matter how many times repeated the results are the same. So what does this have to do with Cassandra? Maxim Grinev is going to tell us. Specifically he points out:
When you develop application for Cassandra you should be aware of the following fact. Even when client observes the failure of an update it is still possible that this update has been executed successfully. The cause of such anomalous behavior is that Cassandra does not support transactional rollback.
This "lack of support" for transactional rollback is actually by design, mainly because rollback of distributed transactions is both expensive and hard to scale. Therefore, as Maxim points out our applications must deal with these intricacies.
So how do you design a data model to account for the issues described above? Well Maxim describes a rather clever way of designing your application:
Instead of just storing the mapping of URLs to counters (i.e. column family URL_statistics where each record has an URL as a key and a single column having counter as its value) a solution can be to store the mapping of each URL to the IDs of the tweets which contains the URL (i.e. column family URL_Tweets where each record has an URL as a key, columns representing tweets, column names are the tweet IDs, and column values are not used). URL counters will then be computed on retrieval by counting tweet IDs.
It is a good idea to store tweet IDs as column names so that Cassandra automatically eliminates duplicates – repeated update will be ignored in this case (use this great Cassandra feature to make your updates idempotent!).
So how early should you design this into your application's data model?
It is common that Cassandra applications are not initially designed for idempotence. At first, small scale deployments do not exhibit these subtle problems and work fine. Only as time passes and their deployments expand the problems manifest and the applications respond to handle them. Do it right from the beginning.
Read more: Update Idempotency: Why It is Important in Cassandra Applications

