Storing Medium Objects (MOBs) in HBase
Data comes in many sizes, and saving all of your data in HBase, including binary data such as images and documents, is convenient. HBase can technically handle binary objects with cells that are up to 10MB in size. However, HBase's normal read and write paths are optimized for values smaller than 100KB in size. When HBase deals with large numbers of values up to 10MB, referred to here as medium objects, or MOBs, performance is degraded due to write amplification caused by splits and compactions.
Traditionally, the way to solve this problem has been to store objects larger than 100KB directly in HDFS, and store references to their locations in HBase. CDH 5.4 introduces optimizations for storing objects up to 10MB in size, termed medium objects or MOBs, directly in HBase, using the work done in HBASE-11339.
To take advantage of MOB, you need to use HFile version 3. Optionally, you can configure the MOB file reader's cache settings for each RegionServer, then configure specific columns to hold MOB data. Client code does not need to change to take advantage of HBase MOB support. The feature is transparent to the client.
Enabling HFile Version 3 Using Cloudera Manager
Minimum Required Role: Full Administrator
- Go to the HBase service.
- Click the Configuration tab.
- Search for the property HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml.
- Paste the following XML into the Value field and save your changes.
<property> <name>hfile.format.version</name> <value>3</value> </property>
Enabling HFile Version 3 Using the Command Line
- If you use Cloudera Manager, do not use these command-line instructions.
- This information applies specifically to CDH 5.8.x. If you use a lower version of CDH, see the documentation for that version located at Cloudera Documentation.
<property> <name>hfile.format.version</name> <value>3</value> </property>
Restart HBase. Changes will take effect for a given region during its next major compaction.
Configuring Columns to Store MOBs
- IS_MOB is a Boolean option, which specifies whether or not the column can store MOBs.
- MOB_THRESHOLD configures the number of bytes at which an object is considered to be a MOB. If you do not specify a value for MOB_THRESHOLD, the default is 100 KB. If you write a value larger than this threshold, it is treated as a MOB.
You can configure a column to store MOBs using the HBase Shell or the Java API.
Using HBase Shell:
hbase> create 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400} hbase> alter 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400}
Using the Java API:
HColumnDescriptor hcd = new HColumnDescriptor(“f”); hcd.setMobEnabled(true); hcd.setMobThreshold(102400L);
HBase MOB Cache Properties
Because there can be a large number of MOB files at any time, as compared to the number of HFiles, MOB files are not always kept open. The MOB file reader cache is a LRU cache which keeps the most recently used MOB files open.
Property | Default | Description |
---|---|---|
hbase.mob.file.cache.size | 1000 | The of opened file handlers to cache. A larger value will benefit reads by providing more file handlers per MOB file cache and would reduce frequent file opening and closing of files. However, if the value is too high, errors such as "Too many opened file handlers" may be logged. |
hbase.mob.cache.evict.period | 3600 | The amount of time in seconds after a file is opened before the MOB cache evicts cached files. The default value is 3600 seconds. |
hbase.mob.cache.evict.remain.ratio | 0.5f | The ratio, expressed as a float between 0.0 and 1.0, that controls how manyfiles remain cached after an eviction is triggered due to the number of cached files exceeding the hbase.mob.file.cache.size. The default value is 0.5f. |
Configuring the MOB Cache Using Cloudera Manager
- Go to the HBase service.
- Click the Configuration tab.
- Search for the property HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml.
-
Paste your configuration into the Value field and save your changes. The following example sets the hbase.mob.cache.evict.period property to 5000 seconds. See HBase MOB Cache Properties for a full list of configurable properties for HBase MOB.
<property> <name>hbase.mob.cache.evict.period</name> <value>5000</value> </property>
- Restart your cluster for the changes to take effect.
Configuring the MOB Cache Using the Command Line
- If you use Cloudera Manager, do not use these command-line instructions.
- This information applies specifically to CDH 5.8.x. If you use a lower version of CDH, see the documentation for that version located at Cloudera Documentation.
<property> <name>hbase.mob.cache.evict.period</name> <value>5000</value> </property>
Testing MOB Storage and Retrieval Performance
$ sudo -u hbase hbase org.apache.hadoop.hbase.IntegrationTestIngestMOB \ -threshold 102400 \ -minMobDataSize 512 \ -maxMobDataSize 5120
- threshold is the threshold at which cells are considered to be MOBs. The default is 1 kB, expressed in bytes.
- minMobDataSize is the minimum value for the size of MOB data. The default is 512 B, expressed in bytes.
- maxMobDataSize is the maximum value for the size of MOB data. The default is 5 kB, expressed in bytes.
Compact MOB Files Manually
hbase> compact 't1' hbase> compact 't1', 'f1' hbase> major_compact 't1' hbase> major_compact 't1', 'f1'
This functionality is also available using the API, using the Admin.compact and Admin.majorCompact methods.
<< Configuring HBase MultiWAL Support | ©2016 Cloudera, Inc. All rights reserved | Configuring the Storage Policy for the Write-Ahead Log (WAL) >> |
Terms and Conditions Privacy Policy |