This is the documentation for Cloudera Enterprise 5.8.x. Documentation for other versions is available at Cloudera Documentation.

Upgrading from CDH 4 Packages to CDH 5 Packages

Minimum Required Role: Full Administrator

If you originally used Cloudera Manager to install CDH using packages, you can upgrade to CDH 5 either using packages or parcels. Parcels is the preferred and recommended way to upgrade, as the upgrade wizard provided for parcels handles the upgrade process almost completely automatically.

The steps to upgrade a CDH installation managed by Cloudera Manager using packages are as follows.

  1. Before You Begin
  2. Stop All Services
  3. Perform Service-Specific Prerequisite Actions
  4. Uninstall CDH 4
  5. Remove CDH 4 Repository Files
  6. Install CDH 5 Components
  7. Run the Upgrade Wizard
  8. Recover from Failed Steps
  9. Restart the Reports Manager Role
  10. Recompile JARs
  11. Finalize the HDFS Metadata Upgrade
  12. Upgrade Wizard Actions
    1. Upgrade HDFS Metadata
    2. Upgrade HBase
    3. Upgrade the Hive Metastore Database
    4. Upgrade Oozie
    5. Upgrade Sqoop
    6. Start Cluster Services
    7. Deploy Client Configuration Files

Before You Begin

  • Read the CDH 5 Release Notes.
  • Read the Cloudera Manager 5 Release Notes.
  • Upgrade to Cloudera Manager 5 before upgrading to CDH 5.
  • Ensure Java 1.7 is installed across the cluster. For installation instructions and recommendations, see Upgrading to Oracle JDK 1.7 in a Cloudera Manager Deployment, and make sure you have read Known Issues and Workarounds in Cloudera Manager 5 before you proceed with the upgrade.
  • Ensure that the Cloudera Manager minor version is equal to or greater than the CDH minor version. For example:
    Target CDH Version Minimum Cloudera Manager Version
    5.0.5 5.0.x
    5.1.4 5.1.x
    5.4.1 5.4.x
  • Make sure there are no Oozie workflows in RUNNING or SUSPENDED status; otherwise the Oozie database upgrade will fail and you will have to reinstall CDH 4 to complete or kill those running workflows.
  • Delete Symbolic Links in HDFS

    If there are symbolic links in HDFS when you upgrade from CDH 4 to CDH 5, the upgrade will fail and you will have to downgrade to CDH 4, delete the symbolic links, and start over. To prevent this, proceed as follows.

    1. cd to the directory on the NameNode that contains the latest fsimage The location of this directory is specified as the value of dfs.namenode.name.dir (or dfs.name.dir) in hdfs-site.xml.
    2. Use a command such as the following to write out the path names in the fsimage:
      $ hdfs oiv -i FSIMAGE -o /tmp/YYYY-MM-DD_FSIMAGE.txt
    3. Use a command such as the following to find the path names of any symbolic links listed in /tmp/YYYY-MM-DD_FSIMAGE.txt and write them out to the file /tmp/symlinks.txt:
      $ grep -- "->" /tmp/YYYY-MM-DD_FSIMAGE.txt > /tmp/symlinks.txt
    4. Delete any symbolic links listed in /tmp/symlinks.txt.
  • When upgrading from CDH 4 to CDH 5, Oozie upgrade can take a very long time. For upgrades from CDH 4.3 and higher, you can reduce this time by reducing the amount of history Oozie retains. To reduce Oozie history:
    1. Go to the Oozie service.
    2. Click the Configuration tab.
    3. Click Category > Advanced.
    4. In Oozie Server Advanced Configuration Snippet (Safety Valve) for oozie-site.xml, enter the following
      <property>
      <name>oozie.service.PurgeService.older.than</name>
      <value>7</value>
      </property>
      <property>
      <name>oozie.service.PurgeService.purge.limit</name>
      <value>1000</value>
      </property>
    5. For CDH lower than 5.2, enable DEBUG level logging:
      1. Click Category > Logs.
      2. Set Oozie Server Logging Threshold to DEBUG.
    6. Click Save Changes to commit the changes.
    7. Restart the Oozie Server role.
    8. Wait for the purge service to run and finish. By default, the service runs every hour. The purge service emits the following messages in the Oozie server log:
      STARTED Purge to purge Workflow Jobs older than [7] days, Coordinator Jobs older than [7] days, and Bundlejobs older than [7] days.
      ENDED Purge deleted [x] workflows, [y] coordinatorActions, [z] coordinators, [w] bundles
    9. Revert the purge service and log level settings to the default.
  • When upgrading from CDH 4 to CDH 5, Hue upgrade can take a very long time if the beeswax_queryhistory, beeswax_savedquery, and oozie_job tables are larger than 1000 records. You can reduce the upgrade time by running a script to reduce the size of the Hue database:
    1. Stop the Hue service.
    2. Back up the Hue database.
    3. Download the history cleanup script to the host running the Hue Server.
    4. Run the following as root:
      • parcel installation
        export HUE_CONF_DIR="/var/run/cloudera-scm-agent/process/`ls -1 /var/run/cloudera-scm-agent/process | grep HUE| sort -n | tail -1 `"
        /opt/cloudera/parcels/CDH/share/hue/build/env/bin/hue shell
      • package installation
        export HUE_CONF_DIR="/var/run/cloudera-scm-agent/process/`ls -1 /var/run/cloudera-scm-agent/process | grep HUE| sort -n | tail -1 `"
        /usr/share/hue/build/env/bin/hue shell
    5. Run the downloaded script in the Hue shell.
  • If Using MySQL as Hue Backend: You may face issues after the upgrade if the default engine for MySQL does not match the engine used by the Hue tables. To confirm the match:
    1. Open the my.cnf file for MySQL, search for "default-storage-engine" and note its value.
    2. Connect to MySQL and run the following commands:
      use hue;
      show create table auth_user;
    3. Search for the "ENGINE=" line and confirm that its value matches the one for the "default-storage-engine" above.

      If the default engines do not match, Hue will display a warning on its start-up page (http://$HUE_HOST:$HUE_PORT/about). Work with your database administrator to convert the current Hue MySQL tables to the engine in use by MySQL, as noted by the "default-storage-engine" property.

  • Whenever upgrading Impala, whether in CDH or a standalone parcel or package, check your SQL against the newest reserved words listed in incompatible changes. If upgrading across multiple versions or in case of any problems, check against the full list of Impala keywords.
  • Run the Host Inspector and fix every issue.
  • If using security, run the The Security Inspector.
  • Run hdfs fsck / and hdfs dfsadmin -report and fix every issue.
  • If using HBase:
    • Run hbase hbck.
    • Before you can upgrade HBase from CDH 4 to CDH 5, your HFiles must be upgraded from HFile v1 format to HFile v2, because CDH 5 no longer supports HFile v1. The upgrade procedure itself is different if you are using Cloudera Manager or the command line, but has the same results. The first step is to check for instances of HFile v1 in the HFiles and mark them to be upgraded to HFile v2, and to check for and report about corrupted files or files with unknown versions, which need to be removed manually. The next step is to rewrite the HFiles during the next major compaction. After the HFiles are upgraded, you can continue the upgrade. After the upgrade is complete, you must recompile custom coprocessors and JARs.To check and upgrade the files:
      1. In the Cloudera Admin Console, go to the HBase service and run Actions > Check HFile Version.
      2. Check the output of the command in the stderr log.
        Your output should be similar to the following:
        Tables Processed:
        hdfs://localhost:41020/myHBase/.META.
        hdfs://localhost:41020/myHBase/usertable
        hdfs://localhost:41020/myHBase/TestTable
        hdfs://localhost:41020/myHBase/t
        
        Count of HFileV1: 2
        HFileV1:
        hdfs://localhost:41020/myHBase/usertable /fa02dac1f38d03577bd0f7e666f12812/family/249450144068442524
        hdfs://localhost:41020/myHBase/usertable /ecdd3eaee2d2fcf8184ac025555bb2af/family/249450144068442512
        
        Count of corrupted files: 1
        Corrupted Files:
        hdfs://localhost:41020/myHBase/usertable/fa02dac1f38d03577bd0f7e666f12812/family/1
        Count of Regions with HFileV1: 2
        Regions to Major Compact:
        hdfs://localhost:41020/myHBase/usertable/fa02dac1f38d03577bd0f7e666f12812
        hdfs://localhost:41020/myHBase/usertable/ecdd3eaee2d2fcf8184ac025555bb2af
        In the example above, you can see that the script has detected two HFile v1 files, one corrupt file and the regions to major compact.
      3. Trigger a major compaction on each of the reported regions. This major compaction rewrites the files from HFile v1 to HFile v2 format. To run the major compaction, start HBase Shell and issue the major_compact command.
        $ /usr/lib/hbase/bin/hbase shell
        hbase> major_compact 'usertable'
        You can also do this in a single step by using the echo shell built-in command.
        $ echo "major_compact 'usertable'" | /usr/lib/hbase/bin/hbase shell
  • Review the upgrade procedure and reserve a maintenance window with enough time allotted to perform all steps. For production clusters, Cloudera recommends allocating up to a full day maintenance window to perform the upgrade, depending on the number of hosts, the amount of experience you have with Hadoop and Linux, and the particular hardware you are using.
  • To avoid lots of alerts during the upgrade process, you can enable maintenance mode on your cluster before you start the upgrade. This will stop email alerts and SNMP traps from being sent, but will not stop checks and configuration validations from being made. Be sure to exit maintenance mode when you have finished the upgrade to re-enable Cloudera Manager alerts.

Stop All Services

  1. Stop the cluster.
    1. On the Home > Status tab, click to the right of the cluster name and select Stop.
    2. Click Stop in the confirmation screen. The Command Details window shows the progress of stopping services.

      When All services successfully stopped appears, the task is complete and you can close the Command Details window.

  2. Stop the Cloudera Management Service:
    1. Do one of the following:
        1. Select Clusters > Cloudera Management Service > Cloudera Management Service.
        2. Select Actions > Stop.
        1. On the Home > Status tab, click to the right of Cloudera Management Service and select Stop.
    2. Click Stop to confirm. The Command Details window shows the progress of stopping the roles.
    3. When Command completed with n/n successful subcommands appears, the task is complete. Click Close.

Perform Service-Specific Prerequisite Actions

  • HDFS - Back up HDFS metadata on the NameNode:
    1. Go to the HDFS service.
    2. Click the Configuration tab.
    3. In the Search field, search for "NameNode Data Directories" and note the value.
    4. On the active NameNode host, back up the directory listed in the NameNode Data Directories property. If more than one is listed, make a backup of one directory, since each directory is a complete copy. For example, if the NameNode data directory is /data/dfs/nn, do the following as root:
      # cd /data/dfs/nn
      # tar -cvf /root/nn_backup_data.tar .

      You should see output like this:

      ./
      ./current/
      ./current/fsimage
      ./current/fstime
      ./current/VERSION
      ./current/edits
      ./image/
      ./image/fsimage
      If there is a file with the extension lock in the NameNode data directory, the NameNode most likely is still running. Repeat the steps, starting by shutting down the NameNode role.
  • Back up the Hive and Sqoop metastore databases.
    1. For each affected service:
      1. If not already stopped, stop the service.
      2. Back up the database. See Backing Up Databases.

Uninstall CDH 4

Uninstall CDH 4 on each host as follows:

Operating System Command
RHEL $ sudo yum remove bigtop-jsvc bigtop-utils bigtop-tomcat hue-common sqoop2-client hbase-solr-doc solr-doc
SLES $ sudo zypper remove bigtop-jsvc bigtop-utils bigtop-tomcat hue-common sqoop2-client hbase-solr-doc solr-doc
Ubuntu or Debian $ sudo apt-get purge bigtop-jsvc bigtop-utils bigtop-tomcat hue-common sqoop2-client hbase-solr-doc solr-doc

Remove CDH 4 Repository Files

Remove all Cloudera CDH 4 repository files. For example, on a Red Hat or similar system, remove all files in /etc/yum.repos.d that have cloudera as part of the name.
  Important:
  • Before removing the files, make sure you have not added any custom entries that you want to preserve. (To preserve custom entries, back up the files before removing them.)
  • Make sure you remove Impala and Search repository files, as well as the CDH repository file.

Install CDH 5 Components

  • Red Hat
    1. Download and install the "1-click Install" package.
      1. Download the CDH 5 "1-click Install" package (or RPM).

        Click the appropriate RPM and Save File to a directory with write access (for example, your home directory).

        OS Version Link to CDH 5 RPM
        RHEL/CentOS/Oracle 5 RHEL/CentOS/Oracle 5 link
        RHEL/CentOS/Oracle 6 RHEL/CentOS/Oracle 6 link
        RHEL/CentOS/Oracle 7 RHEL/CentOS/Oracle 7 link
      2. Install the RPM for all RHEL versions:
        $ sudo yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm 
    2. (Optionally) add a repository key:
      • Red Hat/CentOS/Oracle 5
        $ sudo rpm --import https://archive.cloudera.com/cdh5/redhat/5/x86_64/cdh/RPM-GPG-KEY-cloudera
      • Red Hat/CentOS/Oracle 6
        $ sudo rpm --import https://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
    3. Install the CDH packages:
      $ sudo yum clean all
      $ sudo yum install avro-tools crunch flume-ng hadoop-hdfs-fuse hadoop-hdfs-nfs3 hadoop-httpfs hadoop-kms hbase-solr hive-hbase hive-webhcat hue-beeswax hue-hbase hue-impala hue-pig hue-plugins hue-rdbms hue-search hue-spark hue-sqoop hue-zookeeper impala impala-shell kite llama mahout oozie pig pig-udf-datafu search sentry solr-mapreduce spark-core spark-master spark-worker spark-history-server spark-python sqoop sqoop2 whirr
        Note: Installing these packages also installs all the other CDH packages required for a full CDH 5 installation.
  • SLES
    1. Download and install the "1-click Install" package.
      1. Download the CDH 5 "1-click Install" package.

        Download the rpm file, choose Save File, and save it to a directory to which you have write access (for example, your home directory).

      2. Install the RPM:
        $ sudo rpm -i cloudera-cdh-5-0.x86_64.rpm
      3. Update your system package index by running:
        $ sudo zypper refresh
    2. (Optionally) add a repository key:
      $ sudo rpm --import https://archive.cloudera.com/cdh5/sles/11/x86_64/cdh/RPM-GPG-KEY-cloudera
    3. Install the CDH packages:
      $ sudo zypper clean --all
      $ sudo zypper install avro-tools crunch flume-ng hadoop-hdfs-fuse hadoop-hdfs-nfs3 hadoop-httpfs hadoop-kms hbase-solr hive-hbase hive-webhcat hue-beeswax hue-hbase hue-impala hue-pig hue-plugins hue-rdbms hue-search hue-spark hue-sqoop hue-zookeeper impala impala-shell kite llama mahout oozie pig pig-udf-datafu search sentry solr-mapreduce spark-core spark-master spark-worker spark-history-server spark-python sqoop sqoop2 whirr
        Note: Installing these packages also installs all the other CDH packages required for a full CDH 5 installation.
  • Ubuntu and Debian
    1. Download and install the "1-click Install" package
      1. Download the CDH 5 "1-click Install" package:
        OS Version Package Link
        Jessie Jessie package
        Wheezy Wheezy package
        Precise Precise package
        Trusty Trusty package
      2. Install the package by doing one of the following:
        • Choose Open with in the download window to use the package manager.
        • Choose Save File, save the package to a directory to which you have write access (for example, your home directory), and install it from the command line. For example:
          sudo dpkg -i cdh5-repository_1.0_all.deb
    2. Optionally add a repository key:
      • Debian Wheezy
        $ curl -s https://archive.cloudera.com/cdh5/debian/wheezy/amd64/cdh/archive.key | sudo apt-key add -
      • Ubuntu Precise
        $ curl -s https://archive.cloudera.com/cdh5/ubuntu/precise/amd64/cdh/archive.key | sudo apt-key add -
    3. Install the CDH packages:
      $ sudo apt-get update
      $ sudo apt-get install avro-tools crunch flume-ng hadoop-hdfs-fuse hadoop-hdfs-nfs3 hadoop-httpfs hadoop-kms hbase-solr hive-hbase hive-webhcat hue-beeswax hue-hbase hue-impala hue-pig hue-plugins hue-rdbms hue-search hue-spark hue-sqoop hue-zookeeper impala impala-shell kite llama mahout oozie pig pig-udf-datafu search sentry solr-mapreduce spark-core spark-master spark-worker spark-history-server spark-python sqoop sqoop2 whirr
        Note: Installing these packages also installs all the other CDH packages required for a full CDH 5 installation.

Run the Upgrade Wizard

  1. Log into the Cloudera Manager Admin console.
  2. From the Home > Status tab, click next to the cluster name and select Upgrade Cluster. The Upgrade Wizard starts.
  3. If the option to pick between packages and parcels displays, select the Use Parcels option.
  4. In the Choose CDH Version (Parcels) field, select the CDH version. If there are no qualifying parcels, click the Modify the Remote Parcel Repository URLs link to go to the Parcel Configuration Settings page where you can add the locations of parcel repositories. Click Continue.
  5. Read the notices for steps you must complete before upgrading, click the Yes, I ... checkboxes after completing the steps, and click Continue.
  6. Cloudera Manager checks that hosts have the correct software installed. Click Continue.
  7. The selected parcels are downloaded and distributed. Click Continue.
  8. The Host Inspector runs and displays the CDH version on the hosts. Click Continue.
  9. Click Continue. Cloudera Manager performs all service upgrades and restarts the cluster.
  10. The wizard reports the result of the upgrade. Choose one of the following:
    • Leave OK, set up YARN and import existing configuration from my MapReduce service checked.
      1. Click Continue to proceed. Cloudera Manager stops the YARN service (if running) and its dependencies.
      2. Click Continue to proceed. The next page indicates some additional configuration required by YARN.
      3. Verify or modify the configurations and click Continue. The Switch Cluster to MR2 step proceeds.
      4. When all steps have completed, click Continue.
    • Clear OK, set up YARN and import existing configuration from my MapReduce service.
  11. Click Finish to return to the Home > Status tab.
  12. (Optional) Remove the MapReduce service.
    1. In the MapReduce row, right-click and select Delete. Click Delete to confirm.

Recover from Failed Steps

  Note: If you encounter errors during these steps:
  • If the converting configuration parameters step fails, Cloudera Manager rolls back all configurations to CDH 4. Fix any reported problems and retry the upgrade.
  • If the upgrade command fails at any point after the convert configuration step, there is no retry support in Cloudera Manager. You must first correct the error, then manually re-run the individual commands. You can view the remaining commands in the Recent Commands page.
  • If the HDFS upgrade metadata step fails, you cannot revert back to CDH 4 unless you restore a backup of Cloudera Manager.
The actions performed by the upgrade wizard are listed in Upgrade Wizard Actions. If any of the steps in the Command Progress screen fails, complete the step as described in that section before proceeding.

Restart the Reports Manager Role

  1. Do one of the following:
    • Select Clusters > Cloudera Management Service > Cloudera Management Service.
    • On the Home > Status tab, in Cloudera Management Service table, click the Cloudera Management Service link.
  2. Click the Instances tab.
  3. Check the checkbox next to Reports Manager.
  4. Select Actions for Selected > Restart and then Restart to confirm.

Recompile JARs

Finalize the HDFS Metadata Upgrade

Finalize the HDFS metadata upgrade. To determine when finalization is warranted, run important workloads and ensure they are successful.
  1. Go to the HDFS service.
  2. Click the Instances tab.
  3. Click the NameNode instance.
  4. Select Actions > Finalize Metadata Upgrade and click Finalize Metadata Upgrade to confirm.

Upgrade Wizard Actions

Do the steps in this section only if the upgrade wizard reports a failure.

Upgrade HDFS Metadata

  1. Start the ZooKeeper service.
  2. Go to the HDFS service.
  3. Select Actions > Upgrade HDFS Metadata and click Upgrade HDFS Metadata to confirm.

Upgrade HBase

  1. Go to the HBase service.
  2. Select Actions > Upgrade HBase and click Upgrade HBase to confirm.

Upgrade the Hive Metastore Database

  1. Go to the Hive service.
  2. Select Actions > Upgrade Hive Metastore Database Schema and click Upgrade Hive Metastore Database Schema to confirm.
  3. If you have multiple instances of Hive, perform the upgrade on each metastore database.

Upgrade Oozie

  1. Go to the Oozie service.
  2. Select Actions > Upgrade Database and click Upgrade Database to confirm.
  3. Start the Oozie service.
  4. Select Actions > Install Oozie ShareLib and click Install Oozie ShareLib to confirm.

Upgrade Sqoop

  1. Go to the Sqoop service.
  2. Select Actions > Upgrade Sqoop and click Upgrade Sqoop to confirm.

Start Cluster Services

  1. On the Home > Status tab, click to the right of the cluster name and select Start.
  2. Click Start that appears in the next screen to confirm. The Command Details window shows the progress of starting services.

    When All services successfully started appears, the task is complete and you can close the Command Details window.

Deploy Client Configuration Files

  1. On the Home page, click to the right of the cluster name and select Deploy Client Configuration.
  2. Click the Deploy Client Configuration button in the confirmation pop-up that appears.
Page generated July 8, 2016.