Follow Us

Follow nosqldatabases on Twitter Follow nosqldatabases on Facebook Follow nosqldatabases on Google Buzz Follow nosqldatabases on LinkedIn Follow nosqldatabases on FeedBurner NoSQL presentations on slideshare


Become a sponsor of Contact us to find out how.

Featured Jobs


Follow On Facebook
Recent NoSQL News



From MySQL to MongoDB at Wordnik

Tony Tam from Wordnik describes their migration from MySQL to MongoDB. So why migrate from MySQL? Inserts on their MyISAMTables had approached 10 seconds an insert. They continued to produce workarounds. However, this led to an increase in system babysitting. Nothing than a fragile system to make those weeknights and weekends extra fun, right?

What are the results?

  • Moved 5 Billion rows from MySQL to MongoDB
    • Sustained 100,000 inserts/second
    • Migration tool was the bottleneck (CPU Bound)
  • Wordnik now reads from MongoDB very fast
    • Read + create java objects @ 250,000/second

What about the advice of going live with MongoDB?

  • Choose your use case carefully if migrating incrementally
  • Scary no matter what
  • Test your perf monitoring system first!
  • Use your DAOs from migration
  • Turn on MongoDB on one server, monitor, tune (rollback, repeat)
  • Full switch over when comfortable

As a follow-up Wordnik discussed in a post that they are now hosting 9 billion documents. Read more at B is for Billion


The Graph Traversal Programming Pattern (Part 3) - Graph Traversal

This is the third and final post discussing Marko Rodriguez's presentation on the The Graph Traversal Programming Pattern.

Part 1: Graph Structures
Part 2: Graph Databases

In his presentation Marko sets up an experiment to test a graph database, in this case Neo4j, against a relation data store, MySQL. The purpose of the experiment is to traverse the graph five levels deep. The graph in the experiment contains 1 million vertices and 4 million edges.

  • For the run of the experiment a traverser is placed on a single vertex
    • For each step, the traverser moves to it's adjacent vertices
    • Repeat each step five times

The results? Neo4j completed the experiment in 14 minutes vs. MySQL not completing the job. The full results of the experiment can be found here.

So why use a graph database? Marko provides us with three potential reasons:

  1. If solution to your problem can be represented as a local process within a larger global structure
  2. If solution to your problem can be represented as being with respect to a set of root elements
  3. If solution to your problem does not require a global analysis of your data

So this is great and dandy from a theoretical perspective but what about real life use cases? Marko provides the following examples:

  • Local Searches - What is in the neighborhood around A?
  • Local Recommendations - Given A, what should A include in their neighborhood?
  • Local Ranks - Given A, how would you rank B relative to A?
  • Collaborative Filtering - Find all the items that the person A likes. Then find all the people that like those same items. Then find which items those people like that are not already the items that are liked by person A
  • Question expert identification - Find all the tags associated with question A. For all those tag, find all answers (for any questions) that are tagged by those tags. For those answers, find who created those answers

In conclusion, Marko offers these final points:

  • Graph databases are efficient with respects to local data analysis
  • Locality is defined by direct referent structures
  • Frame all solutions to problems as a traversal over local regions of the graph

This is the Graph Traversal Pattern.


Links of the Day - 2010/07/21

Links of the Day for July 21, 2010

  • Installing CouchDB on a VM - Dennis Delimarsky provides a nice tutorial for installing CouchDB on a VM so that you can give CouchDB 1.0 a try
  • Riptano Packages Cassandra for the Enterprise - Matt Pfeil, co-founder and CEO of Riptano discusses Cassandra. Little bit of background here, Riptano was created as a commercial entity for Cassandra. Pfeil and Jonathan Ellis, who is the project chair for Apache Cassandra, co founded Riptano back in March of 2010
  • NoSQL The Dawn of Polyglot Persistence - We discussed Polyglot Persistence a few weeks back. Stephan Schmidt of Code Monkeyism provides some more ideas about the topic.

The Graph Traversal Programming Pattern (Part 2) - Graph Databases

In last Friday's post we explored a presentation by Marko Rodriguez about the Graph Traversal Programming Pattern. Specifically we explored the various graph structures that exist. In this post we are going to explore graph databases.

Most databases can model a graph. But how do you define a graph database?

  • A graph database is any storage system that provides index-free adjacency
  • Every element (i.e. vertex or edge) has a direct pointer to its adjacent element
  • No O(log2(n)) index lookup required to determine which vertex is adjacent to which other vertex
  • If the graph is connected, the graph as a whole is treated as an atomic data structure

Marko proceeds to demonstrate the difference between a graph database and a non-graph database with respect to index adjacency.

Graph Database

  • Direct references to its adjacent vertices
  • Constant time cost to navigate between vertices

Non-Graph Database

  • Must look at an index to locate adjacent vertices
  • log2(n) time cost to move between vertices

More about index adjacency:

  • While any database can implicitly represent a graph, only a graph database makes the graph structure explicit
  • In a graph database, each vertex serves as a “mini index” of its adjacent elements
  • As the graph grows in size, the cost of a local step remains the same

The final point with regard to graph databases and indices Marko has the following points to make:

  • Graph databases allows you to explicitly model indices endogenous to your domain model. Your indices and domain model are one atomic entity—a graph
  • This has benefits in designing special-purpose index structures for your data.
    • Think about all the numerous types of indices in the geo-spatial community
    • Think about all the indices that you have yet to think about

In the final post of this series we will explore graph traversals with both artificial and real life examples.


Links of the Day - 2010/07/19

Links of the Day for July 19, 2010


The Graph Traversal Programming Pattern (Part 1) - Graph Structures

In his presentation at WindyCityDB, Marko Rodriguez, discusses graph traversal patterns. This is the first part of a multi-part series that will discuss this presentation. Specifically in this posting we are going to discuss the various types of graph structures. We will be discussing graph databases and graph traversals in a following posts.

So what types of graph structures are there? It's an interesting question and one I did not know the answer to until this presentation. Before we can begin discussing the various graph structures we need a small primer.

Graph Primer

  • Dots are vertices
  • Lines are edges
  • Dots and Lines make a Graph

Undirected Graph

  • Vertices
    • All denote the same type of object
  • Edges
    • All edges denote the same type of relationship
    • All edges denote a symmetric relationship
  • Examples
    • Collaborator graph
    • Road graph

Directed Graph

  • Vertices
    • All denote the same type of object
  • Edges
    • All edges denote the same type of relationship
    • All edges denote a asymmetric relationship
  • Primer
    • Directed edge is a line with an arrow
  • Examples
    • Twitter follow graph
    • Web href-citation graph

Single-Relational Graphs

Both undirected and directed graphs are considered single relational graphs.

  • All edges have the same meaning/type
  • Perhaps the most common type of graph type

Limitations of a single relational graph:

  • Only can express a single type of vertex
  • Only can express a single type of edge
  • In general are very limiting graph types

Multi-Relational Graphs

Obviously the opposite of a single-relational graph.

What are the gains with multi-relational graphs?

  • Allows for explicit typing of edges
  • Explicit typing allows for
    • edges to have different meanings
    • vertices to have different types

Property Graph

  • Specialized graph which extends a multi-relational graph by adding key/value map to both edges and vertices
  • Properties useful for expressing non-relational data
  • Allows further refinement of the meaning of an edge
  • Property graphs are the basis for other types of graphs

The entire presentation is shown below. We'll discuss graph databases on Monday.


Links of the Day - 2010/07/16

Links of the Day for July 16, 2010


Ship It! CouchDB turns 1.0

Congratulations are in order for the folks over at CouchIO the commercial entity which supports CouchDB development. They have just released the first production release of CouchDB.

Couple of features/enhancements from the 0.11 release include:

  • Writes are now 300% for large documents
  • Support for Windows
  • New authentication system

Get the new release here


Links of the Day - 2010/07/15

Links of the Day for July 15, 2010


NoSQL - Channeling the Data Explosion

In this presentation Dwight Merriman, of 10Gen, explains NoSQL's role in dealing with the explosion of data.

So why NoSQL?

  • One size no longer fits all
  • Computing power is now considered a commodity
  • Horizontal scaling

An interesting point that Merriman makes is the comparisons of the various NoSQL data stores in how they are alike and different. Outside of the type of data store i.e. Key/Value, Column, Graph, etc. I haven't put a lot of thought into this point. Well what does Merriman have to say?

  • What's the same between NoSQL products?
    • No joins
    • No complex transactions
  • What's different?
    • Scale out model
    • Consistency model
    • Data model

Merriman continues his discussion by talking about data models, influences, CAP and consistency. Finally, he provides some predictions about NoSQL.

  • JSON will be the most popular building block for non-relational data models
  • Tunable consistency
  • Some SQL in NoSQL data stores.


Links of the Day - 2010/07/14

Links of the Day for July 07, 2010



Introduction to Apache Cassandra

So in case you missed it, Cassandra has been in the news the last couple of days. So I thought this would be a good opportunity to provide an introduction to Cassandra via Gary Dusbabek from Rackspace. This presentation was actually given at Silicon Valley Cloud Computing Group back in June of this year.

Couple of key points about Cassandra (not from the presentation):

  • Initially created by Facebook for search functionality for users inbox mail on the site.
  • The source code was open sourced and released to the Apache Software Foundation.
  • Its design was inspired by both Google's BigTable and Amazon's Dynamo.
  • It's considered to be a column data store, similar to a Google BigTable or Apache HBase.

So why Cassandra at all? As Dusbabek mentions from his presentation "vertical scaling is hard". So as the amount of data we create and analyze increases, our strategies for dealing with that data change. Dusbabek walks us through a number of topics in his discussion including scaling, replication model, data model and practical considerations.

So without any further interruptions...


Links of the Day - 2010/07/13

Links of the day for July 13, 2010


Cassandra Migration Is Postponed

So in today's Links of the Day post we have three links that discuss Twitter's postponement of their migration from MySQL to Cassandra for storage of tweets. Now normally features, releases and technologies get postponed all the time. Under normal circumstances most would not even blink about this announcement. However, in this particular case you have two very hot topics Twitter and NoSQL in the limelight.

Unfortunately, the timing of Twitter's migrations plans could not have been at a worse time. They've struggled during the World Cup trying to keep the service up. They've had to resort to some extreme measures including slashing API calls for third party applications.

In the grand scheme of things this is probably more of a bump in the road. All new technologies go through this and I expect to see an announcement in the future that Twitter has fully migrated. Regardless, Twitter is still using Cassandra just for other features at the moment.

I've included a presentation that we've previously discussed here about Twitter using Cassandra.

Original Post: Scaling Twitter with Cassandra


Links of the Day - 2010/07/12

Links of the day for July 12, 2010

Admitingly this is a little Twitter and Cassandra specific but it's a relatively big deal.