This is the documentation for Cloudera Enterprise 5.8.x. Documentation for other versions is available at Cloudera Documentation.

Storing Medium Objects (MOBs) in HBase

Data comes in many sizes, and saving all of your data in HBase, including binary data such as images and documents, is convenient. HBase can technically handle binary objects with cells that are up to 10MB in size. However, HBase's normal read and write paths are optimized for values smaller than 100KB in size. When HBase deals with large numbers of values up to 10MB, referred to here as medium objects, or MOBs, performance is degraded due to write amplification caused by splits and compactions.

Traditionally, the way to solve this problem has been to store objects larger than 100KB directly in HDFS, and store references to their locations in HBase. CDH 5.4 introduces optimizations for storing objects up to 10MB in size, termed medium objects or MOBs, directly in HBase, using the work done in HBASE-11339.

To take advantage of MOB, you need to use HFile version 3. Optionally, you can configure the MOB file reader's cache settings for each RegionServer, then configure specific columns to hold MOB data. Client code does not need to change to take advantage of HBase MOB support. The feature is transparent to the client.

Enabling HFile Version 3 Using Cloudera Manager

Minimum Required Role: Full Administrator

To enable HFile version 3 using Cloudera Manager, edit the HBase RegionServer advanced configuration snippet.
  1. Go to the HBase service.
  2. Click the Configuration tab.
  3. Search for the property HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml.
  4. Paste the following XML into the Value field and save your changes.
    <property>
      <name>hfile.format.version</name>
      <value>3</value>
    </property>
Changes will take effect after the next major compaction.

Enabling HFile Version 3 Using the Command Line

  Important:
  • If you use Cloudera Manager, do not use these command-line instructions.
  • This information applies specifically to CDH 5.8.x. If you use a lower version of CDH, see the documentation for that version located at Cloudera Documentation.
Paste the following XML into hbase-site.xml.
<property>
  <name>hfile.format.version</name>
  <value>3</value>
</property>

Restart HBase. Changes will take effect for a given region during its next major compaction.

Configuring Columns to Store MOBs

Two configuration options are provided to configure a column to store MOBs:
  • IS_MOB is a Boolean option, which specifies whether or not the column can store MOBs.
  • MOB_THRESHOLD configures the number of bytes at which an object is considered to be a MOB. If you do not specify a value for MOB_THRESHOLD, the default is 100 KB. If you write a value larger than this threshold, it is treated as a MOB.

You can configure a column to store MOBs using the HBase Shell or the Java API.

Using HBase Shell:

hbase> create 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400}
hbase> alter 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD =>
102400}

Using the Java API:

HColumnDescriptor hcd = new HColumnDescriptor(“f”);
hcd.setMobEnabled(true);
hcd.setMobThreshold(102400L);

HBase MOB Cache Properties

Because there can be a large number of MOB files at any time, as compared to the number of HFiles, MOB files are not always kept open. The MOB file reader cache is a LRU cache which keeps the most recently used MOB files open.

The following properties are available for tuning the HBase MOB cache.
Table 1. HBase MOB Cache Properties
Property Default Description
hbase.mob.file.cache.size 1000 The of opened file handlers to cache. A larger value will benefit reads by providing more file handlers per MOB file cache and would reduce frequent file opening and closing of files. However, if the value is too high, errors such as "Too many opened file handlers" may be logged.
hbase.mob.cache.evict.period 3600 The amount of time in seconds after a file is opened before the MOB cache evicts cached files. The default value is 3600 seconds.
hbase.mob.cache.evict.remain.ratio 0.5f The ratio, expressed as a float between 0.0 and 1.0, that controls how manyfiles remain cached after an eviction is triggered due to the number of cached files exceeding the hbase.mob.file.cache.size. The default value is 0.5f.

Configuring the MOB Cache Using Cloudera Manager

To configure the MOB cache within Cloudera Manager, edit the HBase Service advanced aonfiguration Snippet for the cluster. Cloudera recommends testing your configuration with the default settings first.
  1. Go to the HBase service.
  2. Click the Configuration tab.
  3. Search for the property HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml.
  4. Paste your configuration into the Value field and save your changes. The following example sets the hbase.mob.cache.evict.period property to 5000 seconds. See HBase MOB Cache Properties for a full list of configurable properties for HBase MOB.
    <property>
      <name>hbase.mob.cache.evict.period</name>
      <value>5000</value>
    </property>
  5. Restart your cluster for the changes to take effect.

Configuring the MOB Cache Using the Command Line

  Important:
  • If you use Cloudera Manager, do not use these command-line instructions.
  • This information applies specifically to CDH 5.8.x. If you use a lower version of CDH, see the documentation for that version located at Cloudera Documentation.
Because there can be a large number of MOB files at any time, as compared to the number of HFiles, MOB files are not always kept open. The MOB file reader cache is a LRU cache which keeps the most recently used MOB files open.
To customize the configuration of the MOB file reader's cache on each RegionServer, configure the MOB cache properties in the RegionServer's hbase-site.xml. Customize the configuration to suit your environment, and restart or rolling restart the RegionServer. Cloudera recommends testing your configuration with the default settings first. The following example sets the hbase.mob.cache.evict.period property to 5000 seconds. See HBase MOB Cache Properties for a full list of configurable properties for HBase MOB.
<property>
  <name>hbase.mob.cache.evict.period</name>
  <value>5000</value>
</property>

Testing MOB Storage and Retrieval Performance

HBase provides the Java utility org.apache.hadoop.hbase.IntegrationTestIngestMOB to assist with testing the MOB feature and deciding on appropriate configuration values for your situation. The utility is run as follows:
$ sudo -u hbase hbase org.apache.hadoop.hbase.IntegrationTestIngestMOB \
            -threshold 102400 \
            -minMobDataSize 512 \
            -maxMobDataSize 5120
  • threshold is the threshold at which cells are considered to be MOBs. The default is 1 kB, expressed in bytes.
  • minMobDataSize is the minimum value for the size of MOB data. The default is 512 B, expressed in bytes.
  • maxMobDataSize is the maximum value for the size of MOB data. The default is 5 kB, expressed in bytes.

Compact MOB Files Manually

You can trigger manual compaction of MOB files manually, rather than waiting for them to be triggered by your configuration, using the HBase Shell commands compact_mob and major_compact_mob. Each of these commands requires the first parameter to be the table name, and takes an optional column family name as the second argument. If the column family is provided, only that column family's files are compacted. Otherwise, all MOB-enabled column families' files are compacted.
hbase> compact 't1'
hbase> compact 't1', 'f1'
hbase> major_compact 't1'
hbase> major_compact 't1', 'f1'

This functionality is also available using the API, using the Admin.compact and Admin.majorCompact methods.

Page generated July 8, 2016.