Search
Follow Us

Follow nosqldatabases on Twitter Follow nosqldatabases on Facebook Follow nosqldatabases on Google Buzz Follow nosqldatabases on LinkedIn Follow nosqldatabases on FeedBurner NoSQL presentations on slideshare

Sponsors

Become a sponsor of NoSQLDatabases.com. Contact us to find out how.

Featured Jobs

 

Follow On Facebook
Recent NoSQL News

Advertisments

Entries in Maxim Grinev (3)

Friday
Aug202010

Managing Indexes in Cassandra using Async Triggers

So yesterday we spoke about an extension to Cassandra that provides asynchronous triggers. Now we will see a use case in action, managing a secondary index with triggers. In this post, Maxim Grinev and Martin Hentschel are at it again, they describe the use case here:

Cassandra does not support secondary indexes at first, but storing redundant data (in a different layout) will give you the same effect. The main drawback is that your application (the code that writes to the DB) needs to take care of managing the index. Every time you write to the DB, you also need to maintain your index.

So by using the asynchronous triggers you can maintain the secondary index without the performance impact.

Check it out: Managing Indexes in Cassandra using Async Triggers

Follow Up: Cassandra is planning native support for secondary indexes in Cassandra. Here is the JIRA (CASSANDRA-749) full of the discussion among the committers.

Thursday
Aug192010

Extending Cassandra with Async Triggers

Maxim Grinev and Martin Hentschel have written about a extension they have written for Cassandra, asynchronous triggers. Specifically a trigger in Cassandra is:

Like traditional database triggers, Cassandra Async trigger is a procedure that is automatically executed by the database in response to certain events on a particular database object (e.g. table or view). The distinguishing feature of Async trigger is that the database responds to the client on successful update execution without waiting for triggers to be executed, thus reducing response latency.

What are the attributes of these triggers:

  • "After" triggers - A trigger is executed after the update operation that fires the trigger and can see the results of the update
  • Trigger procedures are implemented in Java
  • Cassandra Async triggers are mutation-level triggers. A trigger is executed for each mutation issued to the column family.
  • Cassandra Async triggers are asynchronous. The database acknowledges update execution to the client after the update is executed and the fired triggers are submitted for execution. Actual execution of fired triggers happens after the acknowledgement to the client. It allows saving latency but leads to eventual consistency of data.
  • Guarantees triggers to be executed at least once

We will see in a post tomorrow about what you can do with asynchronous triggers.

Read more: Extending Cassandra with Async Triggers

Monday
Aug162010

The Importance of Idempotency for Cassandra Updates

An idempotent process, is a process which no matter how many times repeated the results are the same. So what does this have to do with Cassandra? Maxim Grinev is going to tell us. Specifically he points out:

When you develop application for Cassandra you should be aware of the following fact. Even when client observes the failure of an update it is still possible that this update has been executed successfully. The cause of such anomalous behavior is that Cassandra does not support transactional rollback.

This "lack of support" for transactional rollback is actually by design, mainly because rollback of distributed transactions is both expensive and hard to scale. Therefore, as Maxim points out our applications must deal with these intricacies.

So how do you design a data model to account for the issues described above? Well Maxim describes a rather clever way of designing your application:

Instead of just storing the mapping of URLs to counters (i.e. column family URL_statistics where each record has an URL as a key and a single column having counter as its value) a solution can be to store the mapping of each URL to the IDs of the tweets which contains the URL (i.e. column family URL_Tweets where each record has an URL as a key, columns representing tweets, column names are the tweet IDs, and column values are not used). URL counters will then be computed on retrieval by counting tweet IDs.

It is a good idea to store tweet IDs as column names so that Cassandra automatically eliminates duplicates – repeated update will be ignored in this case (use this great Cassandra feature to make your updates idempotent!).

So how early should you design this into your application's data model?

It is common that Cassandra applications are not initially designed for idempotence. At first, small scale deployments do not exhibit these subtle problems and work fine. Only as time passes and their deployments expand the problems manifest and the applications respond to handle them. Do it right from the beginning.

Read more: Update Idempotency: Why It is Important in Cassandra Applications