PM3: Hadoop and HBase: Motivations, Use cases and Trade-offs

This tutorial session will discuss the motivation, real-world use cases, and choices to make when deploying two related and popular open-source NoSQL systems: Apache Hadoop and Apache HBase.

These systems enable enterprises to store, to analyze, and to profit from all of their data. In order to handle petabyte-scale volumes of raw data, these distributed systems make different assumptions and different design decisions than traditional relational databases.

These designs enable new applications and potentially enable cloud deployments. We will also discuss some of the tradeoffs when choosing between public cloud deployments and private cluster deployments.

During the session we will discuss several topics:

Evolution of the Apache Hadoop and Apache HBase projects.
Tell-tale signs about when to seriously considering Hadoop and HBase
A high-level introduction to HBase's data access interfaces and abstractions.
Examples of real world HBase and Hadoop use cases and application architectures.
How the high-level architecture of HBase and Hadoop enables on-the-fly scaling to deal with increased workloads.
Using tools like Apache Whirr to deploy Hadoop and HBase in Public clouds
Hadoop and HBase tradeoffs: Public cloud vs private cluster.

Jonathan Hsieh is a software engineer at Cloudera, where he works on several distributed systems in the Apache Hadoop ecosystem. He is a contributor on the Apache HBase project, founding member of the Apache Flume project, and a committee on the Apache Sqoop project. Prior to Cloudera, Jonathan earned a MS Computer Science from University of Washington, worked for the Department of Defense, and earned a MS/BS in ECE from Carnegie Mellon University.