Installing the Latest CDH 5 Release
This page explains how to do an unmanaged deployment of CDH 5 from the command line. For a managed deployment, see Cloudera Manager Deployment.
CDH 5 Installation Options
- Automatically install CDH 5 with a Cloudera Manager Deployment. This is the simplest and preferred method.
- Manually install the CDH 5 package or repository in one of three ways:
- Install the CDH 5 "1-click" package (preferred manual method) OR
- Add the CDH 5 repository OR
- Build your own CDH 5 repository.
- Manually install the CDH 5 tarball. See "Package and Tarball Binaries" below.
Package and Tarball Binaries
Installing from Packages
- To install and deploy YARN, see Deploying MapReduce v2 (YARN) on a Cluster.
- To install and deploy MRv1, see Deploying MapReduce v1 (MRv1) on a Cluster.
Installing from a Tarball
- The CDH 5 tarball deploys YARN and includes the MRv1 binaries. There is no separate tarball for MRv1. The MRv1 scripts are in the directory, bin-mapreduce1, and examples are in examples-mapreduce1.
Before You Begin Installing CDH 5 Manually
- This page explains new installations. To upgrade from an earlier release, see Upgrading from CDH 4 to CDH 5.
- To migrate from MRv1 to YARN, see Migrating from MapReduce (MRv1) to MapReduce (MRv2).
- For a list of supported operating systems, see CDH 5 Requirements and Supported Versions.
- Installing CDH 5 requires sudo privileges. If necessary, use root user (superuser) to configure sudo privileges.
- CDH5 requires the Oracle Java Development Kit (JDK). See Java Development Kit Installation.
- In CDH 5, both the NameNode and Resource Manager (or Job Tracker) can be configured for High Availability
- Use the service (8) command to start and stop services rather than running scripts in /etc/init.d directly.
Use the service command to start, stop, and restart CDH components, rather than running scripts in /etc/init.d directly. The service command creates a predictable environment by setting the current working directory to / and removing most environment variables (passing only LANG and TERM). With /etc/init.d, existing environment variables remain in force and can produce unpredictable results. When you install CDH from packages, service is installed as part of the Linux Standard Base (LSB).
Steps to Install CDH 5 Manually
Step 1: Add or Build the CDH 5 Repository or Download the "1-click Install" package.
- To install CDH 5 on a RHEL system, download packages with yum or use a web browser.
- To install CDH 5 on a SLES system, download packages with zypper or YaST or use a web browser.
- To install CDH 5 on an Ubuntu or Debian system, download packages with apt or use a web browser.
On RHEL-compatible Systems
Use one of the following methods to install CDH 5 on RHEL-compatible systems.
- Download and install the CDH 5 "1-click Install" package OR
- Add the CDH 5 repository OR
- Build a Yum Repository
Do this on all the systems in the cluster.
To download and install the CDH 5 "1-click Install" package:
- Download the CDH 5 "1-click Install" package (or RPM).
Click the appropriate RPM and Save File to a directory with write access (for example, your home directory).
OS Version Link to CDH 5 RPM RHEL/CentOS/Oracle 5 RHEL/CentOS/Oracle 5 link RHEL/CentOS/Oracle 6 RHEL/CentOS/Oracle 6 link RHEL/CentOS/Oracle 7 RHEL/CentOS/Oracle 7 link - Install the RPM for all RHEL versions:
$ sudo yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm
Continue with Step 2: Optionally Add a Repository Key. Then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps to install both implementations.
sudo yum clean all
OR: To add the CDH 5 repository:
Download the repo file. Click the link for your RHEL or CentOS system in the table, find the appropriate repo file, and save in /etc/yum.repos.d/.
For OS Version |
Link to CDH 5 Repository |
---|---|
RHEL/CentOS/Oracle 5 |
|
RHEL/CentOS/Oracle 6 |
|
RHEL/CentOS/Oracle 7 |
Continue with Step 2: Optionally Add a Repository Key. Then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps to install both implementations.
sudo yum clean all
OR: To build a Yum repository:
- Download the appropriate repo file
- Create the repo
- Distribute the repo and set up a web server.
Continue with Step 2: Optionally Add a Repository Key. Then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps to install both implementations.
sudo yum clean all
On SLES Systems
Use one of the following methods to download the CDH 5 repository or package on SLES systems.
- Download and install the CDH 5 "1-click Install" Package OR
- Add the CDH 5 repository OR
- Build a SLES Repository
To download and install the CDH 5 "1-click Install" package:
- Download the CDH 5 "1-click Install" package.
Download the rpm file, choose Save File, and save it to a directory to which you have write access (for example, your home directory).
- Install the RPM:
$ sudo rpm -i cloudera-cdh-5-0.x86_64.rpm
- Update your system package index by running:
$ sudo zypper refresh
Continue with Step 2: Optionally Add a Repository Key. Then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps to install both implementations.
OR: To add the CDH 5 repository:
- Run the following command:
$ sudo zypper addrepo -f https://archive.cloudera.com/cdh5/sles/11/x86_64/cdh/cloudera-cdh5.repo
- Update your system package index by running:
$ sudo zypper refresh
Continue with Step 2: Optionally Add a Repository Key. Then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps to install both implementations.
sudo zypper clean --all
OR: To build a SLES repository:
If you want to create your own SLES repository, create a mirror of the CDH SLES directory by following these instructions that explain how to create a SLES repository from the mirror.
Continue with Step 2: Optionally Add a Repository Key. Then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps to install both implementations.
sudo zypper clean --all
On Ubuntu or Debian Systems
Use one of the following methods to download the CDH 5 repository or package.
- Download and install the CDH 5 "1-click Install" Package OR
- Add the CDH 5 repository OR
- Build a Debian Repository
To download and install the CDH 5 "1-click Install" package:
- Download the CDH 5 "1-click Install" package:
OS Version Package Link Jessie Jessie package Wheezy Wheezy package Precise Precise package Trusty Trusty package - Install the package by doing one of the following:
- Choose Open with in the download window to use the package manager.
- Choose Save File, save the package to a directory to which you have write access (for example, your home directory), and install it from the command line.
For example:
sudo dpkg -i cdh5-repository_1.0_all.deb
sudo apt-get update
Continue with Step 2: Optionally Add a Repository Key. Then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps to install both implementations.
OR: To add the CDH 5 repository:
- Download the appropriate cloudera.list file by issuing one of the following commands. You can use another HTTP client if wget is not available, but the syntax may be different.
Important: Ubuntu 14.04 (Trusty)
For Ubuntu Trusty systems, you must perform an extra step after adding the repository. See "Additional Step for Trusty Ubuntu Trusty and Debian Jessie" below.
OS Version Command Debian Jessie $ sudo wget 'https://archive.cloudera.com/cdh5/debian/jessie/amd64/cdh/cloudera.list' \ -O /etc/apt/sources.list.d/cloudera.list
Debian Wheezy $ sudo wget 'https://archive.cloudera.com/cdh5/debian/wheezy/amd64/cdh/cloudera.list' \ -O /etc/apt/sources.list.d/cloudera.list
Ubuntu Precise $ sudo wget 'https://archive.cloudera.com/cdh5/ubuntu/precise/amd64/cdh/cloudera.list' \ -O /etc/apt/sources.list.d/cloudera.list
Ubuntu Lucid $ sudo wget 'https://archive.cloudera.com/cdh5/ubuntu/lucid/amd64/cdh/cloudera.list' \ -O /etc/apt/sources.list.d/cloudera.list
Ubuntu Trusty $ sudo wget 'https://archive.cloudera.com/cdh5/ubuntu/trusty/amd64/cdh/cloudera.list' \ -O /etc/apt/sources.list.d/cloudera.list
sudo apt-get update
This step ensures that you get the right ZooKeeper package for the current CDH release. You need to prioritize the Cloudera repository you have just added, such that you install the CDH version of ZooKeeper rather than the version that is bundled with Ubuntu Trusty or Debian Jessie.
Package: * Pin: release o=Cloudera, l=Cloudera Pin-Priority: 501
Continue with Step 2: Optionally Add a Repository Key. Then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps to install both implementations.
OR: To build a Debian repository:
If you want to create your own apt repository, create a mirror of the CDH Debian directory and then create an apt repository from the mirror.
Continue with Step 2: Optionally Add a Repository Key. Then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps to install both implementations.
Step 2: Optionally Add a Repository Key
Before installing YARN or MRv1: (Optionally) add a repository key on each system in the cluster. Add the Cloudera Public GPG Key to your repository by executing one of the following commands:
- For RHEL/CentOS/Oracle 5 systems:
$ sudo rpm --import https://archive.cloudera.com/cdh5/redhat/5/x86_64/cdh/RPM-GPG-KEY-cloudera
- For RHEL/CentOS/Oracle 6 systems:
$ sudo rpm --import https://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
- For RHEL/CentOS/Oracle 7 systems:
$ sudo rpm --import https://archive.cloudera.com/cdh5/redhat/7/x86_64/cdh/RPM-GPG-KEY-cloudera
- For all SLES systems:
$ sudo rpm --import https://archive.cloudera.com/cdh5/sles/11/x86_64/cdh/RPM-GPG-KEY-cloudera
- For Ubuntu or Debian systems:
OS Version Command Debian Jessie $ wget https://archive.cloudera.com/cdh5/debian/jessie/amd64/cdh/archive.key -O archive.key $ sudo apt-key add archive.key
Debian Wheezy $ wget https://archive.cloudera.com/cdh5/debian/wheezy/amd64/cdh/archive.key -O archive.key $ sudo apt-key add archive.key
Ubuntu Precise $ wget https://archive.cloudera.com/cdh5/ubuntu/precise/amd64/cdh/archive.key -O archive.key $ sudo apt-key add archive.key
Ubuntu Lucid $ wget https://archive.cloudera.com/cdh5/ubuntu/lucid/amd64/cdh/archive.key -O archive.key $ sudo apt-key add archive.key
Ubuntu Trusty $ wget https://archive.cloudera.com/cdh5/ubuntu/trusty/amd64/cdh/archive.key -O archive.key $ sudo apt-key add archive.key
This key enables you to verify that you are downloading genuine packages.
Step 3: Install CDH 5 with YARN
To install CDH 5 with YARN:
- Install and deploy ZooKeeper.
Important: Cloudera recommends that you install (or update) and start a ZooKeeper cluster before proceeding. This is a requirement if you are deploying high availability (HA) for the NameNode.
Follow instructions under ZooKeeper Installation.
- Install each type of daemon package on the appropriate systems(s), as follows.
Where to install
Install commands
Resource Manager host (analogous to MRv1 JobTracker) running:
RHEL/CentOS compatible
sudo yum clean all; sudo yum install hadoop-yarn-resourcemanager
SLES
sudo zypper clean --all; sudo zypper install hadoop-yarn-resourcemanager
Ubuntu or Debian
sudo apt-get update; sudo apt-get install hadoop-yarn-resourcemanager
NameNode host running:
RHEL/CentOS compatible
sudo yum clean all; sudo yum install hadoop-hdfs-namenode
SLES
sudo zypper clean --all; sudo zypper install hadoop-hdfs-namenode
Ubuntu or Debian
sudo apt-get install hadoop-hdfs-namenode
Secondary NameNode host (if used) running:
RHEL/CentOS compatible
sudo yum clean all; sudo yum install hadoop-hdfs-secondarynamenode
SLES
sudo zypper clean --all; sudo zypper install hadoop-hdfs-secondarynamenode
Ubuntu or Debian
sudo apt-get install hadoop-hdfs-secondarynamenode
All cluster hosts except the Resource Manager running:
RHEL/CentOS compatible
sudo yum clean all; sudo yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
SLES
sudo zypper clean --all; sudo zypper install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
Ubuntu or Debian
sudo apt-get install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
One host in the cluster running:
RHEL/CentOS compatible
sudo yum clean all; sudo yum install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver
SLES
sudo zypper clean --all; sudo zypper install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver
Ubuntu or Debian
sudo apt-get install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver
All client hosts running:
RHEL/CentOS compatible
sudo yum clean all; sudo yum install hadoop-client
SLES
sudo zypper clean --all; sudo zypper install hadoop-client
Ubuntu or Debian
sudo apt-get install hadoop-client
Step 4: Install CDH 5 with MRv1
- Whether to configure High Availability (HA) for the NameNode or JobTracker; see the High Availability for more information and instructions.
- Where to deploy the NameNode, Secondary NameNode, and JobTracker daemons. As a general rule:
- The NameNode and JobTracker run on the same "master" host unless the cluster is large (more than a few tens of nodes), and the master host (or hosts) should not run the Secondary NameNode (if used), DataNode or TaskTracker services.
- In a large cluster, it is especially important that the Secondary NameNode (if used) runs on a separate machine from the NameNode.
- Each node in the cluster except the master host(s) should run the DataNode and TaskTracker services.
If you decide to configure HA for the NameNode, do not install hadoop-hdfs-secondarynamenode. After completing the HA software configuration, follow the installation instructions under Deploying HDFS High Availability.
Follow instructions under ZooKeeper Installation. Make sure you create the myid file in the data directory, as instructed, if you are starting a ZooKeeper ensemble after a fresh install.
Next, install packages.
Where to install |
Install commands |
---|---|
JobTracker host running: |
|
RHEL/CentOS compatible |
sudo yum clean all; sudo yum install hadoop-0.20-mapreduce-jobtracker |
SLES |
sudo zypper clean --all; sudo zypper install hadoop-0.20-mapreduce-jobtracker |
Ubuntu or Debian |
sudo apt-get update; sudo apt-get install hadoop-0.20-mapreduce-jobtracker |
NameNode host running: |
|
RHEL/CentOS compatible |
sudo yum clean all; sudo yum install hadoop-hdfs-namenode |
SLES |
sudo zypper clean --all; sudo zypper install hadoop-hdfs-namenode |
Ubuntu or Debian |
sudo apt-get install hadoop-hdfs-namenode |
Secondary NameNode host (if used) running: |
|
RHEL/CentOS compatible |
sudo yum clean all; sudo yum install hadoop-hdfs-secondarynamenode |
SLES |
sudo zypper clean --all; sudo zypper install hadoop-hdfs-secondarynamenode |
Ubuntu or Debian |
sudo apt-get install hadoop-hdfs-secondarynamenode |
All cluster hosts except the JobTracker, NameNode, and Secondary (or Standby) NameNode hosts running: |
|
RHEL/CentOS compatible |
sudo yum clean all; sudo yum install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode |
SLES |
sudo zypper clean --all; sudo zypper install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode |
Ubuntu or Debian |
sudo apt-get install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode |
All client hosts running: |
|
RHEL/CentOS compatible |
sudo yum clean all; sudo yum install hadoop-client |
SLES |
sudo zypper clean --all; sudo zypper install hadoop-client |
Ubuntu or Debian |
sudo apt-get install hadoop-client |
Step 5: (Optional) Install LZO
yum remove hadoop-lzo
- Add the repository on each host in the cluster. Follow the instructions for your OS version:
For OS Version Do this RHEL/CentOS/Oracle 5 Go to this link and save the file in the /etc/yum.repos.d/ directory. RHEL/CentOS/Oracle 6 Go to this link and save the file in the /etc/yum.repos.d/ directory. RHEL/CentOS/Oracle 7 Go to this link and save the file in the /etc/yum.repos.d/ directory. SLES - Run the following command:
$ sudo zypper addrepo -f https://archive.cloudera.com/gplextras5/sles/11/x86_64/gplextras/ cloudera-gplextras5.repo
- Update your system package index by running:
$ sudo zypper refresh
Ubuntu or Debian Go to this link and save the file as /etc/apt/sources.list.d/gplextras.list. Important: Make sure you do not let the file name default to cloudera.list, as that will overwrite your existing cloudera.list. - Run the following command:
- Install the package on each host as follows:
For OS version Install commands RHEL/CentOS compatible sudo yum install hadoop-lzo
SLES sudo zypper install hadoop-lzo
Ubuntu or Debian sudo apt-get install hadoop-lzo
- Continue with installing and deploying CDH. As part of the deployment, you will need to do some additional configuration for LZO, as shown under Configuring LZO.
Important: Be sure to do this configuration after you have copied the default configuration files to a custom location and set alternatives to point to it.
Step 6: Deploy CDH and Install Components
Proceed with:
<< Creating a Local Yum Repository | ©2016 Cloudera, Inc. All rights reserved | Installing an Earlier CDH 5 Release >> |
Terms and Conditions Privacy Policy |