Monday
Jun142010
Hive - A Petabyte Scale Data Warehouse Using Hadoop
Monday, June 14, 2010 at 6:00AM |
Derek Stainer Now technically, Hadoop itself does not belong in the NoSQL discussion. However, components that make up the Hadoop ecosystem such as HBase and Hive are definitely candidates for discussion. In this presentation by the Facebook Data Team, they discuss their usage of Hive in combination with Hadoop to solve Facebook's data warehousing and analytical needs.
I found it interesting how Facebook's blended the use of both traditional SQL data stores such as Oracle and MySQL and NoSQL solutions such as Hive as part of their overall solution.
Other interesting statistics:
- 10 TB of compressed new data added per day
- 135 TB of compressed data scanned per day
- 7500+ Hive jobs per day
- 80K compute hours per day

