On Tuesday's Links of the Day, we featured a link that discussed a content repository named Lily that was built on top of HBase and Solr. Well in today's post we are going to dive deeper and look at how OuterThought came to the conclusion to use HBase. Secondly, how they are using HBase to solve their problems.
OuterThought was having trouble scaling in three areas of their application:
- Access Control
- Facet Browsing
- Anything that required Random Access
Their previous architecture consisted of MySQL, Lucene and the file system itself. Knew they needed to grow a solution that allowed for scalability, availability and performance. So how did they try to get there? Using traditional approaches. Pushed more logic into the database, scaled out the database and added message queues among everything else. Ultimately, NoSQL begins entering the picture.
So what are the requirements for the migration from MySQL and what NoSQL store would they migrate to? They took a phased approach.
- Automatic scaling to large data sets
- Fault tolerance
- Flexible data model for sparse data
- Efficient access to random data
- Open source
- Java (not a hard requirement)
- Commodity hardware
- Integration to Hadoop (nice but not necessary)
- Atomic Updates
What did the selection of HBase provide?
- HDFS good for storing large blobs of data
- Data model that was flexible and fit their CMS document model
- Ordered tables which allowed for scan ranges among other things
So what does Lily use HBase for?
- Storage of underlying content
- Storage of forward/backward link index tables
- Storage of various secondary indexes