Search
Follow Us

Follow nosqldatabases on Twitter Follow nosqldatabases on Facebook Follow nosqldatabases on Google Buzz Follow nosqldatabases on LinkedIn Follow nosqldatabases on FeedBurner NoSQL presentations on slideshare

Sponsors

Become a sponsor of NoSQLDatabases.com. Contact us to find out how.

Featured Jobs

 

Follow On Facebook
Recent NoSQL News

Advertisments

Entries in Twitter (12)

Wednesday
Feb092011

Rainbird: Realtime Analytics at Twitter

The O'Reilly Strata Conference wrapped up last week. One of the presentations to emerge from the conference was Kevin Weil's presentation: Realtime Analytics at Twitter. In the presentation Kevin discusses the importance of realtime data analysis, as well as, what changes were necessary for Cassandra to be a good fit.

Click to read more ...

Wednesday
Nov102010

The Hadoop Ecosystem at Twitter

We have more slides and video from this years Hadoop World. This particular set of slides and video comes from Kevin Weil's presentation about how Twitter is using Hadoop. Specifically this presentation explores how Twitter is taking advantage of both Hadoop and the various supporting frameworks/tools like Pig, Hive and HBase to support critical business and engineering problems.

Click to read more ...

Tuesday
Jul132010

Links of the Day - 2010/07/13

Links of the day for July 13, 2010

Monday
Jul122010

Cassandra Migration Is Postponed

So in today's Links of the Day post we have three links that discuss Twitter's postponement of their migration from MySQL to Cassandra for storage of tweets. Now normally features, releases and technologies get postponed all the time. Under normal circumstances most would not even blink about this announcement. However, in this particular case you have two very hot topics Twitter and NoSQL in the limelight.

Unfortunately, the timing of Twitter's migrations plans could not have been at a worse time. They've struggled during the World Cup trying to keep the service up. They've had to resort to some extreme measures including slashing API calls for third party applications.

In the grand scheme of things this is probably more of a bump in the road. All new technologies go through this and I expect to see an announcement in the future that Twitter has fully migrated. Regardless, Twitter is still using Cassandra just for other features at the moment.

I've included a presentation that we've previously discussed here about Twitter using Cassandra.

Original Post: Scaling Twitter with Cassandra

Monday
Jul122010

Links of the Day - 2010/07/12

Links of the day for July 12, 2010

Admitingly this is a little Twitter and Cassandra specific but it's a relatively big deal.

Friday
Jul022010

Polyglot Persistence, is it the future of application persistence?

In yesterday's post John Nunmaker discussed the future of persistence as it related to NoSQL. His thoughts were that application persistence would be hosted and would employ polyglot persistence. In today's post we are going to explore that last piece, polyglot persistence.

In his presentation at WindyCityDB, John P. Wood discusses polyglot persistence, what it is and how does it help?

Key points from the presentation:

  • RDBMS is no longer the default choice, but it's not dead either
  • We now have several choices in the NoSQL arena. Having choices is great. However, it means we must do the work to validate our tool of choice as the right one for the job. 

So what exactly is polyglot persistence?

The continued or prolonged existence of something using several databases.

Scott Lebnerknight is quoted in the presentation to reinforce this point:

Polyglot Persistence, like polyglot programming, is all about choosing the right persistence option for the task at hand.

One could assume this would be like using Grails for the UI portion of a web application and perhaps Java for other backed processes. So what does this mean? It means you that the default mode of large organizations will be to support multiple data stores.

Some organizations like Facebook and Twitter are already doing this. Specifically in both cases you see these organizations using MySQL, Cassandra and HBase for various aspects of there applications.

As John Wood beautifully summarizes why we do this:

Right tool for the job

Thursday
Jun032010

Twitter: Announcing Snowflake

With NoSQL being in its relative early days organizations are having to solve problems that have not existed in the traditional relational data store. Take for example, Twitter and its need to generate unique ids for each tweet. So what are the requirements for such a task?

We needed something that could generate tens of thousands of ids per second in a highly available manner. This naturally led us to choose an uncoordinated approach.

These ids need to be roughly sortable, meaning that if tweets A and B are posted around the same time, they should have ids in close proximity to one another since this is how we and most Twitter clients sort tweets.

Additionally, these numbers have to fit into 64 bits. We’ve been through the painful process of growing the number of bits used to store tweet ids before. It’s unsurprisingly hard to do when you have over 100,000 different codebases involved.

So what is the solution? Well it's project Snowflake of course!

Read more: Announcing Snowflake

Saturday
May292010

Stumped why folks are turning to NoSQL

Joe Stump, CTO and founder of SimpleGEO, offers us a different perspective of why organizations such as his are using NoSQL. Most debates center on technical concerns such as scalability or performance requirements. However, in his post Joe focuses on a different side of the debate, operations. Joe does not debate that it's possible to get traditional RDBMS systems to perform at the same rate as NoSQL data stores, But rather points out the lengths in terms of manpower and cost that is required to do so. I think the questions that Joe asks in his post are just as interesting as the answers to those questions.

  • Do you honestly think that the PhDs at Google, Amazon, Twitter, Digg, and Facebook created Cassandra, BigTable, Dynamo, etc. when they could have just used a RDBMS instead?
  • How much are you spending on those MS SQL servers with SSD drives that serve up 6,100 results a second?
  • How much time are your DBAs spending administering your RDBMSs?
  • How much time are they in the data centers?
  • How much do those data centers cost?
  • How much do DBAs cost a year?
  • How easy is it to add a new server to your cluster?

Ultimately, Joe sums up his reasons to use NoSQL with the following quote:

I guess what I’m saying is that my decision to use NoSQL, and I’m guessing others’ decisions to do so, has less to do with the fact that we can’t squeeze a few thousand writes a second out of MySQL and more to do with management and cost overhead. NoSQL solutions allow us to serve absurd amounts of data for a really, really low price.

Read more: NoSQL vs. RDBMS: Let the flames begin!

Tuesday
May252010

Hadoop, Pig and HBase at Twitter

Again the folks at Twitter provide us the material for this next post. Specifically, Dimitriy Ryaboy a member of the Analytics team at Twitter, discusses Twitter's usage of Hadoop, Pig and HBase. Now technically both Hadoop and Pig are not really pure NoSQL, really they are ancilary components that interact with a NoSQL data store HBase.

However, HBase is a big part of the Hadoop ecosystem and Pig provides a simplified query mechanism for HBase, so in my opinion they are worth the discussion.

There are obviously several important points made throughout the discussion but one slide I found particularily interesting in which Dimitriy explains how they use Cassandra and HBase and what they use them for.

Rough Analogy: Cassandra is OLTP and HBase is OLAP

Monday
May242010

Big Data in Real Time at Twitter

Another presentation from the folks over at Twitter, specifically from Nick Kallen. This presentation focuses on how Twitter deals with big data in real time. The presentation addresses how Twitter handles Tweets, Timelines, Social Graphs and Search Indices.

General principles that come from their various solutions to their problems are summarized with the following points in the presentation:

  • All engineering solutions are transient
  • Nothing’s perfect but some solutions are good enough for a while
  • Scalability solutions are not magic. They involve partitioning, indexing and replication
  • All data for real-time queries MUST be in memory. Disk is for writes only.
  • Some problems can be solved with pre-computation, but a lot can’t
  • Exploit locality where possible.

 

Sunday
May232010

Scaling Twitter with Cassandra

Let's paint a little picture about how it was at Twitter prior to Cassandra.

  • Horizontal and vertically partitioned MySQL
  • Memcached (rows, indexes and fragments
  • Application Managed

This obviously had drawbacks such as:

  • Many single points of failure
  • Hardware Intensive
  • Manpower Intensive
  • Tight Coupling

The solution, Cassandra, the NoSQL data store originally developed by Facebook and given to the Apache Software Foundation. The presentation below is from Ryan King, a member of the Storage Team at Twitter. The presentation dives into details of how Cassandra solved the problems listed above. Enjoy!

 

Sunday
May232010

NoSQL at Twitter

This is a slide presentation by Kevin Weil, Analytics Lead at Twitter about NoSQL at Twitter. Kevin details the  how Twitter arrived at using NoSQL. For a majority of their applications, both front and back end, they were using MySQL using the standard scalability techniques. However, were still running into issues batch jobs not completing within a specified time and write limits just to name a few.

Twitter uses several different pieces of the NoSQL ecosystem, employing the tool that is best suited for the job. Hadoop and HBase for batch oriented tasks such as analytics. Using Cassandra for storing tweets.

In addition, Twitter has been an active participant not only in patching the various tools it uses it has also contributed back to the open source community with its release of FlockDB, a distributed graph database.

Definitely a great read to see how an application, with high scaling and performance requirements, uses NoSQL to solve it’s problems.