Tuesday, September 28, 2010 at 12:00PM | Derek Stainer
Interesting article describing some performance statistics from VoltDB. VoltDB is not quite a NoSQL database, but isn't quite 100% a traditional relational database either. Regardless the numbers being discussed are damn impressive.
Wednesday, September 8, 2010 at 10:00PM | Derek Stainer
Lot's of excitement in the Cassandra world at the moment. Is it Cassandra's fault or not? It's almost like our own little soap opera. It's part of the reason I find this presentation to be very appropriate, Performance Tuning for Cassandra. This presentation is by Brandon Williams of Riptano the commercial support company for Cassandra and yes they've been actively involved with Digg for several months now preparing for the go-live.
Joe Stump, CTO and founder of SimpleGEO, offers us a different perspective of why organizations such as his are using NoSQL. Most debates center on technical concerns such as scalability or performance requirements. However, in his post Joe focuses on a different side of the debate, operations. Joe does not debate that it's possible to get traditional RDBMS systems to perform at the same rate as NoSQL data stores, But rather points out the lengths in terms of manpower and cost that is required to do so. I think the questions that Joe asks in his post are just as interesting as the answers to those questions.
Do you honestly think that the PhDs at Google, Amazon, Twitter, Digg, and Facebook created Cassandra, BigTable, Dynamo, etc. when they could have just used a RDBMS instead?
How much are you spending on those MS SQL servers with SSD drives that serve up 6,100 results a second?
How much time are your DBAs spending administering your RDBMSs?
How much time are they in the data centers?
How much do those data centers cost?
How much do DBAs cost a year?
How easy is it to add a new server to your cluster?
Ultimately, Joe sums up his reasons to use NoSQL with the following quote:
I guess what I’m saying is that my decision to use NoSQL, and I’m guessing others’ decisions to do so, has less to do with the fact that we can’t squeeze a few thousand writes a second out of MySQL and more to do with management and cost overhead. NoSQL solutions allow us to serve absurd amounts of data for a really, really low price.
This is a slide presentation by Kevin Weil, Analytics Lead at Twitter about NoSQL at Twitter. Kevin details the how Twitter arrived at using NoSQL. For a majority of their applications, both front and back end, they were using MySQL using the standard scalability techniques. However, were still running into issues batch jobs not completing within a specified time and write limits just to name a few.
Twitter uses several different pieces of the NoSQL ecosystem, employing the tool that is best suited for the job. Hadoop and HBase for batch oriented tasks such as analytics. Using Cassandra for storing tweets.
In addition, Twitter has been an active participant not only in patching the various tools it uses it has also contributed back to the open source community with its release of FlockDB, a distributed graph database.
Definitely a great read to see how an application, with high scaling and performance requirements, uses NoSQL to solve it’s problems.
That is the level of performance that Eliot Horowitz has reached with MongoDB. The following are the slides from his presentation on sharding and a link to the video of the presentation as well.
The following blog discusses why the Digg engineering team made the switch, dropping MySQL in favor of Apache's Cassandra. The blog post mentions a common motivations why developers are looking at other ways to build applications.
Our primary motivation for moving away from MySQL is the increasing difficulty of building a high performance, write intensive, application on a data set that is growing quickly, with no end in sight. This growth has forced us into horizontal and vertical partitioning strategies that have eliminated most of the value of a relational database, while still incurring all the overhead.
With that said, most sites are going to be the size of Facebook or Digg. So their performance and scalability characteristics are going to be much different. However, that doesn't mean their isn't a place for technologies such as Cassandra in your technology stack.