This is the documentation for Cloudera Enterprise 5.8.x. Documentation for other versions is available at Cloudera Documentation.

Managing Encryption Keys and Zones

Interacting with the KMS and creating encryption zones requires the use of two new CLI commands: hadoop key and hdfs crypto. The following sections will help you get started with creating encryption keys and setting up encryption zones.

Validating Hadoop Key Operations

  Warning: If you are using or plan to use Cloudera Navigator Key HSM in conjunction with Cloudera Navigator Key Trustee Server, ensure that key names begin with alphanumeric characters and do not use special characters other than hyphen (-), period (.), or underscore (_). Using other special characters can prevent you from migrating your keys to an HSM. See Integrating Key HSM with Key Trustee Server for more information.
Use hadoop key create to create a test key, and then use hadoop key list to retrieve the key list:
$ sudo -u <key_admin> hadoop key create keytrustee_test
$ hadoop key list

Creating Encryption Zones

  Important: Cloudera does not currently support configuring the root directory as an encryption zone. Nested encryption zones are also not supported.

Once a KMS has been set up and the NameNode and HDFS clients have been correctly configured, use the hadoop key and hdfs crypto command-line tools to create encryption keys and set up new encryption zones.

  • Create an encryption key for your zone as the application user that will be using the key. For example, if you are creating an encryption zone for HBase, create the key as the hbase user as follows:
    $ sudo -u hbase hadoop key create <key_name>
  • Create a new empty directory and make it an encryption zone using the key created above.
    $ sudo -u hdfs hadoop fs -mkdir /encryption_zone
    $ sudo -u hdfs hdfs crypto -createZone -keyName <key_name> -path /encryption_zone
    You can verify creation of the new encryption zone by running the -listZones command. You should see the encryption zone along with its key listed as follows:
    $ sudo -u hdfs hdfs crypto -listZones
    /encryption_zone    <key_name>
      Warning: Do not delete an encryption key as long as it is still in use for an encryption zone. This results in loss of access to data in that zone.

For more information and recommendations on creating encryption zones for each CDH component, see Configuring CDH Services for HDFS Encryption.

Adding Files to an Encryption Zone

Existing data can be encrypted by coping it copied into the new encryption zones using tools like DistCp.

You can add files to an encryption zone by copying them to the encryption zone using distcp. For example:
sudo -u hdfs hadoop distcp /user/dir /encryption_zone

DistCp Considerations

A common use case for DistCp is to replicate data between clusters for backup and disaster recovery purposes. This is typically performed by the cluster administrator, who is an HDFS superuser. To retain this workflow when using HDFS encryption, a new virtual path prefix has been introduced, /.reserved/raw/, that gives superusers direct access to the underlying block data in the filesystem. This allows superusers to distcp data without requiring access to encryption keys, and avoids the overhead of decrypting and re-encrypting data. It also means the source and destination data will be byte-for-byte identical, which would not have been true if the data was being re-encrypted with a new EDEK.

  Warning:

When using /.reserved/raw/ to distcp encrypted data, make sure you preserve extended attributes with the -px flag. This is because encrypted attributes such as the EDEK are exposed through extended attributes and must be preserved to be able to decrypt the file.

This means that if the distcp is initiated at or above the encryption zone root, it will automatically create a new encryption zone at the destination if it does not already exist. Hence, Cloudera recommends you first create identical encryption zones on the destination cluster to avoid any potential mishaps.

Copying between encrypted and unencrypted locations

By default, distcp compares checksums provided by the filesystem to verify that data was successfully copied to the destination. When copying between an unencrypted and encrypted location, the filesystem checksums will not match since the underlying block data is different.

In this case, you can specify the -skipcrccheck and -update flags to avoid verifying checksums.

Deleting Encryption Zones

To remove an encryption zone, delete the encrypted directory:
  Warning: This command deletes the entire directory and all of its contents. Ensure that the data is no longer needed before running this command.
$ sudo -u hdfs hadoop fs -rm -r -skipTrash /encryption_zone

Backing Up Encryption Keys

  Warning: It is very important that you regularly back up your encryption keys. Failure to do so can result in irretrievable loss of encrypted data.

If you are using the Java KeyStore KMS, make sure you regularly back up the Java KeyStore that stores the encryption keys. If you are using the Key Trustee KMS and Key Trustee Server, see Backing Up and Restoring Key Trustee Server and Clients for instructions on backing up Key Trustee Server and Key Trustee KMS.

Page generated July 8, 2016.