Tuesday, May 10, 2016

Introduction to HBase


HBase is a distributed column-oriented database built on top of HDFS. HBase is the
Hadoop application to use when you require real-time read/write random-access to
very large datasets.

The canonical HBase use case is the webtable, a table of crawled web pages and their
attributes (such as language and MIME type) keyed by the web page URL. The webtable
is large, with row counts that run into the billions. Batch analytic and parsing
MapReduce jobs are continuously run against the webtable deriving statistics and
adding new columns of verified MIME type and parsed text content for later indexing
by a search engine.

Excerpts from  Hadoop: The Definitive Guide, Tom White, Pub by O'Reilly


Data Model
Applications store data into labeled tables. Tables are made of rows and columns. Table
cells—the intersection of row and column coordinates—are versioned. By default, their
version is a timestamp auto-assigned by HBase at the time of cell insertion. A cell’s
content is an uninterpreted array of bytes.

Tables are automatically partitioned horizontally by HBase into regions. Each region
comprises a subset of a table’s rows. A region is denoted by the table it belongs to, its
first row, inclusive, and last row, exclusive.


Row updates are atomic, no matter how many row columns constitute the row-level

HBase modeled with an HBase master node orchestrating a cluster of one or more
regionserver slaves. The HBase master is responsible for bootstrapping
a virgin install, for assigning regions to registered regionservers, and for recovering
regionserver failures. The master node is lightly loaded.HBase depends on ZooKeeper and by default it manages a ZooKeeper
instance as the authority on cluster state.
transaction. This keeps the locking model simple.

Download a stable release from an Apache Download Mirror and unpack it on your
local filesystem.

HBase, like Hadoop, is written in Java.

HBase classes and utilities in the org.apache.hadoop.hbase.mapreduce package facilitate
using HBase as a source and/or sink in MapReduce jobs.

Avro, REST, and Thrift
HBase ships with Avro, REST, and Thrift interfaces. These are useful when the interacting
application is written in a language other than Java. In all cases, a Java server
hosts an instance of the HBase client brokering application Avro, REST, and Thrift
requests in and out of the HBase cluster.

What is HBase | Why Hbase | Hbase Tutorial 1



Hadoop Notes and Video Lectures

What is Hadoop? Text and Video Lectures

What is MapReduce? Text and Video Lectures

The Hadoop Distributed Filesystem (HDFS)

Hadoop Input - Output System

No comments:

Post a Comment