This is the documentation for Cloudera Enterprise 5.8.x. Documentation for other versions is available at Cloudera Documentation.

Installing CDH 5 with YARN on a Single Linux Host in Pseudo-distributed mode

  Important:
  • If you use Cloudera Manager, do not use these command-line instructions.
  • This information applies specifically to CDH 5.8.x. If you use a lower version of CDH, see the documentation for that version located at Cloudera Documentation.

Before you start, uninstall MRv1 if necessary

If you have already installed MRv1 following the steps in the previous section, you now need to uninstall hadoop-0.20-conf-pseudo before running YARN. Proceed as follows.

  1. Stop the daemons:
    $ for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x stop ; done 
    $ for x in 'cd /etc/init.d ; ls hadoop-0.20-mapreduce-*' ; do sudo service $x stop ; done
  2. Remove hadoop-0.20-conf-pseudo:
    • On Red Hat-compatible systems:
      $ sudo yum remove hadoop-0.20-conf-pseudo hadoop-0.20-mapreduce-*
    • On SLES systems:
      $ sudo zypper remove hadoop-0.20-conf-pseudo hadoop-0.20-mapreduce-*
    • On Ubuntu or Debian systems:
      $ sudo apt-get remove hadoop-0.20-conf-pseudo hadoop-0.20-mapreduce-*

    In this case (after uninstalling hadoop-0.20-conf-pseudo) you can skip the package download steps below.

  Important:

If you have not already done so, install the Oracle Java Development Kit (JDK) before deploying CDH 5. Follow these instructions.

On Red Hat/CentOS/Oracle 5 or Red Hat 6 systems, do the following:

Download the CDH 5 Package

  1. Click the entry in the table below that matches your Red Hat or CentOS system, choose Save File, and save the file to a directory to which you have write access (it can be your home directory).
    OS Version Link to CDH 5 RPM
    RHEL/CentOS/Oracle 5 RHEL/CentOS/Oracle 5 link
    RHEL/CentOS/Oracle 6 RHEL/CentOS/Oracle 6 link
    RHEL/CentOS/Oracle 7 RHEL/CentOS/Oracle 7 link
  2. Install the RPM.

    For Red Hat/CentOS/Oracle 5:

    $ sudo yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm 

    For Red Hat/CentOS/Oracle 6 (64-bit):

    $ sudo yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm

    For instructions on how to add a CDH 5 yum repository or build your own CDH 5 yum repository, see Installing CDH 5 On Red Hat-compatible systems.

Install CDH 5

  1. (Optionally) add a repository key. Add the Cloudera Public GPG Key to your repository by running the following command:
    • For Red Hat/CentOS/Oracle 5 systems:
      $ sudo rpm --import https://archive.cloudera.com/cdh5/redhat/5/x86_64/cdh/RPM-GPG-KEY-cloudera
    • For Red Hat/CentOS/Oracle 6 systems:
      $ sudo rpm --import https://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
  2. Install Hadoop in pseudo-distributed mode: To install Hadoop with YARN:
    $ sudo yum install hadoop-conf-pseudo

On SLES systems, do the following:

Download and install the CDH 5 package

  1. Download the CDH 5 "1-click Install" package.

    Download the rpm file, choose Save File, and save it to a directory to which you have write access (for example, your home directory).

  2. Install the RPM:
    $ sudo rpm -i cloudera-cdh-5-0.x86_64.rpm

    For instructions on how to add a CDH 5 SLES repository or build your own CDH 5 SLES repository, see Installing CDH 5 On SLES systems.

Install CDH 5

  1. (Optionally) add a repository key. Add the Cloudera Public GPG Key to your repository by running the following command:
    • For all SLES systems:
      $ sudo rpm --import https://archive.cloudera.com/cdh5/sles/11/x86_64/cdh/RPM-GPG-KEY-cloudera
  2. Install Hadoop in pseudo-distributed mode: To install Hadoop with YARN:
    $ sudo zypper install hadoop-conf-pseudo 

On Ubuntu and other Debian systems, do the following:

Download and install the package

  1. Download the CDH 5 "1-click Install" package:
    OS Version Package Link
    Jessie Jessie package
    Wheezy Wheezy package
    Precise Precise package
    Trusty Trusty package
  2. Install the package by doing one of the following:
    • Choose Open with in the download window to use the package manager.
    • Choose Save File, save the package to a directory to which you have write access (for example, your home directory), and install it from the command line. For example:
      sudo dpkg -i cdh5-repository_1.0_all.deb
  Note:

For instructions on how to add a CDH 5 Debian repository or build your own CDH 5 Debian repository, see Installing CDH 5 On Ubuntu or Debian systems.

Install CDH 5

  1. (Optionally) add a repository key. Add the Cloudera Public GPG Key to your repository by running the following command:
    • For Ubuntu Lucid systems:
      $ curl -s https://archive.cloudera.com/cdh5/ubuntu/lucid/amd64/cdh/archive.key | sudo apt-key add -
    • For Ubuntu Precise systems:
      $ curl -s https://archive.cloudera.com/cdh5/ubuntu/precise/amd64/cdh/archive.key | sudo apt-key add -
    • For Debian Squeeze systems:
      $ curl -s https://archive.cloudera.com/cdh5/debian/squeeze/amd64/cdh/archive.key | sudo apt-key add -
  2. Install Hadoop in pseudo-distributed mode: To install Hadoop with YARN:
    $ sudo apt-get update 
    $ sudo apt-get install hadoop-conf-pseudo

Starting Hadoop and Verifying it is Working Properly

For YARN, a pseudo-distributed Hadoop installation consists of one host running all five Hadoop daemons: namenode, secondarynamenode, resourcemanager, datanode, and nodemanager.

  • To view the files on Red Hat or SLES systems:
$ rpm -ql hadoop-conf-pseudo
  • To view the files on Ubuntu systems:
$ dpkg -L hadoop-conf-pseudo

The new configuration is self-contained in the /etc/hadoop/conf.pseudo directory.

The Cloudera packages use the alternative framework for managing which Hadoop configuration is active. All Hadoop components search for the Hadoop configuration in /etc/hadoop/conf.

To start Hadoop, proceed as follows.

Step 1: Format the NameNode.

Before starting the NameNode for the first time you must format the file system.

$ sudo -u hdfs hdfs namenode -format

Make sure you perform the format of the NameNode as user hdfs. You can do this as part of the command string, using sudo -u hdfs as in the command above.

  Important:

In earlier releases, the hadoop-conf-pseudo package automatically formatted HDFS on installation. In CDH 5, you must do this explicitly.

Step 2: Start HDFS

$ for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done

To verify services have started, you can check the web console. The NameNode provides a web console http://localhost:50070/ for viewing your Distributed File System (DFS) capacity, number of DataNodes, and logs. In this pseudo-distributed configuration, you should see one live DataNode named localhost.

Step 3: Create the directories needed for Hadoop processes.

Issue the following command to create the directories needed for all installed Hadoop processes with the appropriate permissions.
$ sudo /usr/lib/hadoop/libexec/init-hdfs.sh

Step 4: Verify the HDFS File Structure:

Run the following command:

$ sudo -u hdfs hadoop fs -ls -R /

You should see output similar to the following excerpt:

...
drwxrwxrwt - hdfs supergroup 0 2012-05-31 15:31 /tmp
drwxr-xr-x - hdfs supergroup 0 2012-05-31 15:31 /tmp/hadoop-yarn
drwxrwxrwt - mapred mapred 0 2012-05-31 15:31 /tmp/hadoop-yarn/staging
drwxr-xr-x - mapred mapred 0 2012-05-31 15:31 /tmp/hadoop-yarn/staging/history
drwxrwxrwt - mapred mapred 0 2012-05-31 15:31 /tmp/hadoop-yarn/staging/history/done_intermediate
drwxr-xr-x - hdfs supergroup 0 2012-05-31 15:31 /var
drwxr-xr-x - hdfs supergroup 0 2012-05-31 15:31 /var/log
drwxr-xr-x - yarn mapred 0 2012-05-31 15:31 /var/log/hadoop-yarn
...

Step 5: Start YARN

$ sudo service hadoop-yarn-resourcemanager start
$ sudo service hadoop-yarn-nodemanager start 
$ sudo service hadoop-mapreduce-historyserver start

Step 6: Create User Directories

Create a home directory on the NameNode for each MapReduce user. For example:

$ sudo -u hdfs hadoop fs -mkdir /user/<user>
$ sudo -u hdfs hadoop fs -chown <user> /user/<user>

where <user> is the Linux username of each user.

Alternatively, you can log in as each Linux user (or write a script to do so) and create the home directory as follows:

$ sudo -u hdfs hadoop fs -mkdir /user/$USER
$ sudo -u hdfs hadoop fs -chown $USER /user/$USER

Running an example application with YARN

  1. Create a home directory on HDFS for the user who will be running the job (for example, joe):
    $ sudo -u hdfs hadoop fs -mkdir /user/joe 
    $ sudo -u hdfs hadoop fs -chown joe /user/joe

    Do the following steps as the user joe.

  2. Make a directory in HDFS called input and copy some XML files into it by running the following commands in pseudo-distributed mode:
    $ hadoop fs -mkdir input
    $ hadoop fs -put /etc/hadoop/conf/*.xml input
    $ hadoop fs -ls input
    Found 3 items:
    -rw-r--r-- 1 joe supergroup 1348 2012-02-13 12:21 input/core-site.xml
    -rw-r--r-- 1 joe supergroup 1913 2012-02-13 12:21 input/hdfs-site.xml
    -rw-r--r-- 1 joe supergroup 1001 2012-02-13 12:21 input/mapred-site.xml
  3. Set HADOOP_MAPRED_HOME for user joe:
    $ export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
  4. Run an example Hadoop job to grep with a regular expression in your input data.
    $ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output23 'dfs[a-z.]+'
  5. After the job completes, you can find the output in the HDFS directory named output23 because you specified that output directory to Hadoop.
    $ hadoop fs -ls 
    Found 2 items
    drwxr-xr-x - joe supergroup 0 2009-08-18 18:36 /user/joe/input
    drwxr-xr-x - joe supergroup 0 2009-08-18 18:38 /user/joe/output23

    You can see that there is a new directory called output23.

  6. List the output files.
    $ hadoop fs -ls output23 
    Found 2 items
    drwxr-xr-x - joe supergroup 0 2009-02-25 10:33 /user/joe/output23/_SUCCESS
    -rw-r--r-- 1 joe supergroup 1068 2009-02-25 10:33 /user/joe/output23/part-r-00000
  7. Read the results in the output file.
    $ hadoop fs -cat output23/part-r-00000 | head
    1 dfs.safemode.min.datanodes
    1 dfs.safemode.extension
    1 dfs.replication
    1 dfs.permissions.enabled
    1 dfs.namenode.name.dir
    1 dfs.namenode.checkpoint.dir
    1 dfs.datanode.data.dir
Page generated July 8, 2016.