This is the documentation for Cloudera Enterprise 5.8.x. Documentation for other versions is available at Cloudera Documentation.

Deploying HBase on a Cluster

After you have HBase running in pseudo-distributed mode, the same configuration can be extended to running on a distributed cluster.

  Note: Before you start

This section assumes that you have already installed the HBase Master and the HBase RegionServer and gone through the steps for standalone and pseudo-distributed configuration. You are now about to distribute the processes across multiple hosts; see Choosing Where to Deploy the Processes.

Choosing Where to Deploy the Processes

For small clusters, Cloudera recommends designating one node in your cluster as the HBase Master node. On this node, you will typically run the HBase Master and a ZooKeeper quorum peer. These master processes may be collocated with the Hadoop NameNode and JobTracker for small clusters.

Designate the remaining nodes as RegionServer nodes. On each node, Cloudera recommends running a RegionServer, which may be collocated with a Hadoop TaskTracker (MRv1) and a DataNode. When co-locating with TaskTrackers, be sure that the resources of the machine are not oversubscribed – it's safest to start with a small number of MapReduce slots and work up slowly.

The HBase Thrift service is light-weight, and can be run on any node in the cluster.

Configuring for Distributed Operation

After you have decided which machines will run each process, you can edit the configuration so that the nodes can locate each other. In order to do so, you should make sure that the configuration files are synchronized across the cluster. Cloudera strongly recommends the use of a configuration management system to synchronize the configuration files, though you can use a simpler solution such as rsync to get started quickly.

The only configuration change necessary to move from pseudo-distributed operation to fully-distributed operation is the addition of the ZooKeeper Quorum address in hbase-site.xml. Insert the following XML property to configure the nodes with the address of the node where the ZooKeeper quorum peer is running:

<property>
  <name>hbase.zookeeper.quorum</name>
  <value>mymasternode</value>
</property>
The hbase.zookeeper.quorum property is a comma-separated list of hosts on which ZooKeeper servers are running. If one of the ZooKeeper servers is down, HBase will use another from the list. By default, the ZooKeeper service is bound to port 2181. To change the port, add the hbase.zookeeper.property.clientPort property to hbase-site.xml and set the value to the port you want ZooKeeper to use. In CDH 5.7.0 and higher, you do not need to use hbase.zookeeper.property.clientPort. Instead, you can specify the port along with the hostname for each ZooKeeper host:
<property>
  <name>hbase.zookeeper.quorum</name>
  <value>zk1.example.com:2181,zk2.example.com:20000,zk3.example.com:31111</value>
</property>
For more information, see this chapter of the Apache HBase Reference Guide.

To start the cluster, start the services in the following order:

  1. The ZooKeeper Quorum Peer
  2. The HBase Master
  3. Each of the HBase RegionServers

After the cluster is fully started, you can view the HBase Master web interface on port 60010 and verify that each of the RegionServer nodes has registered properly with the master.

See also Accessing HBase by using the HBase Shell, Using MapReduce with HBase and Troubleshooting HBase. For instructions on improving the performance of local reads, see Optimizing Performance in CDH.

Page generated July 8, 2016.