Search
Follow Us

Follow nosqldatabases on Twitter Follow nosqldatabases on Facebook Follow nosqldatabases on Google Buzz Follow nosqldatabases on LinkedIn Follow nosqldatabases on FeedBurner NoSQL presentations on slideshare

Sponsors

Become a sponsor of NoSQLDatabases.com. Contact us to find out how.

Featured Jobs

 

Follow On Facebook
Recent NoSQL News

Advertisments
« San Diego NoSQL Meetup, Wednesday Aug. 4th | Main | Django and Neo4j - Domain modeling that kicks ass »
Thursday
Jul292010

Building a content repository on top of NoSQL

On Tuesday's Links of the Day, we featured a link that discussed a content repository named Lily that was built on top of HBase and Solr. Well in today's post we are going to dive deeper and look at how OuterThought came to the conclusion to use HBase. Secondly, how they are using HBase to solve their problems.

OuterThought was having trouble scaling in three areas of their application:

  1. Access Control
  2. Facet Browsing
  3. Anything that required Random Access

Their previous architecture consisted of MySQL, Lucene and the file system itself. Knew they needed to grow a solution that allowed for scalability, availability and performance. So how did they try to get there? Using traditional approaches. Pushed more logic into the database, scaled out the database and added message queues among everything else. Ultimately, NoSQL begins entering the picture.

So what are the requirements for the migration from MySQL and what NoSQL store would they migrate to? They took a phased approach.

Phase 1

  • Automatic scaling to large data sets
  • Fault tolerance
  • Flexible data model for sparse data
  • Efficient access to random data
  • Open source
  • Java (not a hard requirement)
  • Commodity hardware

Phase 2

  • Integration to Hadoop (nice but not necessary)
  • Consistency
  • Atomic Updates

What did the selection of HBase provide?

  1. HDFS good for storing large blobs of data
  2. Data model that was flexible and fit their CMS document model
  3. Ordered tables which allowed for scan ranges among other things

So what does Lily use HBase for?

  • Storage of underlying content
  • Storage of forward/backward link index tables
  • Storage of various secondary indexes

Reader Comments (1)

It’s great to hear others talking about “content repositories” and alternatives to relational databases. Search as a first class citizen is also a critical aspect. With MarkLogic Server, we’ve built search into the DNA of the product from the ground up. This allows us to have full ACID transactions and real-time search—no asynchronous index updates or queuing. With a shared-nothing cluster architecture, it’s possible to build sophisticated search applications over hundreds of TB of content on commodity hardware. We’ve got customers doing this in production today.

Full disclosure: I’m a Product Manager at MarkLogic

July 29, 2010 | Unregistered CommenterJustin Makeig

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>