This is the documentation for Cloudera Enterprise 5.8.x. Documentation for other versions is available at Cloudera Documentation.

Decommisioning DataNodes Using the Command Line

Decommisioning a DataNode excludes it from a cluster after its data is replicated to active nodes. To decommision a DataNode:
  1. Create a file named dfs.exclude in the HADOOP_CONF_DIR (default is /etc/hadoop/conf).
  2. Add the name of each DataNode host to be decommissioned on individual lines.
  3. Stop the TaskTracker on the DataNode to be decommissioned.
  4. Add the following property to hdfs-site.xml on the NameNode host.
    <property>
      <name>dfs.hosts.exclude</name>
      <value>/etc/hadoop/conf/dfs.exclude</value>
    <property>

When a DataNode is marked for decommision, all of the blocks on that DataNode are marked as under replicated. In the NameNode UI under Decommissioning DataNodes you can see a total number of under replicated blocks, which will reduce over time, indicating decommissioning progress.

Cloudera recommends that you decommision no more than two DataNodes at one time.

Stopping the Decommissioning Process

To stop the decommissioning process for a DataNode using the command line:
  1. Remove the DataNode name from /etc/hadoop/conf/dfs.exclude.
  2. Run the command $ hdfs dfsadmin -refreshNodes.
Page generated July 8, 2016.