Requirements and Restrictions for Data Migration between CDH 4 and CDH 5
- The CDH 5 cluster must have a MapReduce service running on it (MRv1 or YARN (MRv2)).
- All the MapReduce nodes in the CDH 5 cluster should have full network access to all the nodes of the source cluster. This allows you to perform the copy in a distributed manner.
- To copy data between a secure and an insecure cluster, you must run the distcp command on the secure cluster.
- To copy data from a CDH 4 to a CDH 5 cluster, you can do one of the following:
Note:
The term source in this case refers to the CDH 4 (or other Hadoop) cluster you want to migrate or copy data from; and destination refers to the CDH 5 cluster.
- Running commands on the destination cluster, use the Hftp protocol for the source cluster, and HDFS for the destination. (Hftp is read-only, so you must run DistCp on the destination
cluster and pull the data from the source cluster.) See Copying Data Between Two Clusters Using Distcp.
Note:
Do not use this method if one of the clusters is secure and the other is not.
- Running commands on the source cluster, use the HDFS or webHDFS protocol for the source cluster, and webHDFS for the destination. See Copying Data between a Secure and an Insecure Cluster using DistCp and WebHDFS.
- Running commands on the destination cluster, use webHDFS for the source cluster, and webHDFS for the destination. See Copying Data between a Secure and an Insecure Cluster using DistCp and WebHDFS.
- Running commands on the destination cluster, use the Hftp protocol for the source cluster, and HDFS for the destination. (Hftp is read-only, so you must run DistCp on the destination
cluster and pull the data from the source cluster.) See Copying Data Between Two Clusters Using Distcp.
The following restrictions currently apply (see Apache Hadoop Known Issues):
- DistCp does not work between a secure cluster and an insecure cluster in some cases.
As of CDH 5.1.3, DistCp does work between a secure and an insecure cluster if you use the webHDFS protocol and run the command from the secure cluster side after setting ipc.client.fallback-to-simple-auth-allowed to true, as described under Copying Data between a Secure and an Insecure Cluster using DistCp and WebHDFS.
- To use DistCp using Hftp from a secure cluster using SPNEGO, you must configure the dfs.https.port property on the client to use the HTTP port (50070 by default).
Page generated July 8, 2016.
<< Migrating Data between a CDH 4 and CDH 5 Cluster | ©2016 Cloudera, Inc. All rights reserved | Copying Data Between Two Clusters Using Distcp >> |
Terms and Conditions Privacy Policy |