Configuring HDFS Trash
The Hadoop trash feature helps prevent accidental deletion of files and directories. If trash is enabled and a file or directory is deleted using the Hadoop shell, the file is moved to the .Trash directory in the user's home directory instead of being deleted. Deleted files are initially moved to the Current sub-directory of the .Trash directory, and their original path is preserved. If trash checkpointing is enabled, the Current directory is periodically renamed using a timestamp. Files in .Trash are permanently removed after a user-configurable time delay. Files and directories in the trash can be restored simply by moving them to a location outside the .Trash directory.
- The trash feature is disabled by default. Cloudera recommends that you enable it on all production clusters.
- The trash feature works by default only for files and directories deleted using the Hadoop shell. Files or directories deleted programmatically using other interfaces (WebHDFS or the
Java APIs, for example) are not moved to trash, even if trash is enabled, unless the program has implemented a call to the trash functionality. (Hue, for example, implements trash as of CDH 4.4.)
Users can bypass trash when deleting files using the shell by specifying the -skipTrash option to the hadoop fs -rm -r command. This can be useful when it is necessary to delete files that are too large for the user's quota.
Configuring HDFS Trash Using Cloudera Manager
Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)
Enabling and Disabling Trash
- Go to the HDFS service.
- Click the Configuration tab.
- Select .
- Select or clear the Use Trash checkbox.
If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.
- Click Save Changes to commit the changes.
- Restart the cluster and deploy the cluster client configuration.
Setting the Trash Interval
- Go to the HDFS service.
- Click the Configuration tab.
- Select .
- Specify the Filesystem Trash Interval property, which controls the number of minutes after which a trash checkpoint directory is deleted and the number of
minutes between trash checkpoints. For example, to enable trash so that deleted files are deleted after 24 hours, set the value of the Filesystem Trash Interval property
to 1440.
Note: The trash interval is measured from the point at which the files are moved to trash, not from the last time the files were modified.
If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.
- Click Save Changes to commit the changes.
- Restart all NameNodes.
Configuring HDFS Trash Using the Command Line
See Enabling Trash.
<< Configuring Short-Circuit Reads | ©2016 Cloudera, Inc. All rights reserved | HDFS Balancers >> |
Terms and Conditions Privacy Policy |