This is the documentation for Cloudera Enterprise 5.8.x. Documentation for other versions is available at Cloudera Documentation.

Upgrading CDH 4 to CDH 5

The instructions in this topic describe how to upgrade from a CDH 4 to CDH 5 release. You can upgrade to CDH 5 within the Cloudera Manager Admin Console using parcels or packages. Using parcels vastly simplifies the upgrade process. Electing to upgrade using packages means that all future upgrades must be done manually.

  Important:
  • You cannot perform a rolling upgrade from CDH 4 to CDH 5. There are incompatibilities between the major versions, so a rolling restart is not possible. Rolling upgrade is also not supported from CDH 5 Beta 2 to CDH 5.
  • If you have just upgraded to Cloudera Manager 5, you must hard restart the Cloudera Manager Agents as described in the (Optional) Deploy a Cloudera Manager Agent Upgrade.
  • HBase - After you upgrade you must recompile all HBase coprocessor and custom JARs.
  • Impala
    • If you upgrade to CDH 5.1, Impala will be upgraded to 1.4.1. See New Features in Impala for information about Impala 1.4.x features.
    • If you upgrade to CDH 5.0, Impala will be upgraded to 1.3.2. If you have CDH 4 installed with Impala 1.4.0, Impala will be downgraded to Impala 1.3.2. See New Features in Impala for information about Impala 1.3 features.
  • Hive and Parquet
    • When upgrading from CDH 4 to CDH 5, upgrade scripts may modify your schemas. For example, if you used Parquet with CDH 4, CDH 5 changes the input and output formats. The upgrade script changes parquet.hive.DeprecatedParquetInputFormat and parquet.hive.DeprecatedParquetOutputFormat to org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat and org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat in the schema. This may cause errors such as Table already exists but schema doesn’t match and you may need to modify MapReduce jobs to use the newer Parquet Serdes.
  • MapReduce and YARN
    • In a Cloudera Manager deployment of a CDH 5 cluster, the YARN service is the default MapReduce computation framework.In CDH 5, the MapReduce service has been deprecated. However, the MapReduce service is fully supported for backward compatibility through the CDH 5 lifecycle.
    • In a Cloudera Manager deployment of a CDH 4 cluster, the MapReduce service is the default MapReduce computation framework.You can create a YARN service in a CDH 4 cluster, but it is not considered production ready.
    • For production uses, Cloudera recommends that only one MapReduce framework should be running at any given time. If development needs or other use case requires switching between MapReduce and YARN, both services can be configured at the same time, but only one should be running (to fully optimize the hardware resources available).
    For information on migrating from MapReduce to YARN, see Managing YARN (MRv2) and MapReduce (MRv1).
  Warning: You can use Cloudera Manager to roll back an upgrade from CDH 4 to CDH 5 as long as you backup certain configuration files, databases, and other artifacts before beginning an upgrade. However, after you have finalized the HDFS upgrade you can no longer roll back the CDH upgrade. See Rolling Back a CDH 4-to-CDH 5 Upgrade for the backup and rollback procedures.

Before You Begin

  • Read the CDH 5 Release Notes.
  • Read the Cloudera Manager 5 Release Notes.
  • Upgrade to Cloudera Manager 5 before upgrading to CDH 5.
  • Ensure Java 1.7 is installed across the cluster. For installation instructions and recommendations, see Upgrading to Oracle JDK 1.7 in a Cloudera Manager Deployment, and make sure you have read Known Issues and Workarounds in Cloudera Manager 5 before you proceed with the upgrade.
  • Ensure that the Cloudera Manager minor version is equal to or greater than the CDH minor version. For example:
    Target CDH Version Minimum Cloudera Manager Version
    5.0.5 5.0.x
    5.1.4 5.1.x
    5.4.1 5.4.x
  • Make sure there are no Oozie workflows in RUNNING or SUSPENDED status; otherwise the Oozie database upgrade will fail and you will have to reinstall CDH 4 to complete or kill those running workflows.
  • Delete Symbolic Links in HDFS

    If there are symbolic links in HDFS when you upgrade from CDH 4 to CDH 5, the upgrade will fail and you will have to downgrade to CDH 4, delete the symbolic links, and start over. To prevent this, proceed as follows.

    1. cd to the directory on the NameNode that contains the latest fsimage The location of this directory is specified as the value of dfs.namenode.name.dir (or dfs.name.dir) in hdfs-site.xml.
    2. Use a command such as the following to write out the path names in the fsimage:
      $ hdfs oiv -i FSIMAGE -o /tmp/YYYY-MM-DD_FSIMAGE.txt
    3. Use a command such as the following to find the path names of any symbolic links listed in /tmp/YYYY-MM-DD_FSIMAGE.txt and write them out to the file /tmp/symlinks.txt:
      $ grep -- "->" /tmp/YYYY-MM-DD_FSIMAGE.txt > /tmp/symlinks.txt
    4. Delete any symbolic links listed in /tmp/symlinks.txt.
  • When upgrading from CDH 4 to CDH 5, Oozie upgrade can take a very long time. For upgrades from CDH 4.3 and higher, you can reduce this time by reducing the amount of history Oozie retains. To reduce Oozie history:
    1. Go to the Oozie service.
    2. Click the Configuration tab.
    3. Click Category > Advanced.
    4. In Oozie Server Advanced Configuration Snippet (Safety Valve) for oozie-site.xml, enter the following
      <property>
      <name>oozie.service.PurgeService.older.than</name>
      <value>7</value>
      </property>
      <property>
      <name>oozie.service.PurgeService.purge.limit</name>
      <value>1000</value>
      </property>
    5. For CDH lower than 5.2, enable DEBUG level logging:
      1. Click Category > Logs.
      2. Set Oozie Server Logging Threshold to DEBUG.
    6. Click Save Changes to commit the changes.
    7. Restart the Oozie Server role.
    8. Wait for the purge service to run and finish. By default, the service runs every hour. The purge service emits the following messages in the Oozie server log:
      STARTED Purge to purge Workflow Jobs older than [7] days, Coordinator Jobs older than [7] days, and Bundlejobs older than [7] days.
      ENDED Purge deleted [x] workflows, [y] coordinatorActions, [z] coordinators, [w] bundles
    9. Revert the purge service and log level settings to the default.
  • When upgrading from CDH 4 to CDH 5, Hue upgrade can take a very long time if the beeswax_queryhistory, beeswax_savedquery, and oozie_job tables are larger than 1000 records. You can reduce the upgrade time by running a script to reduce the size of the Hue database:
    1. Stop the Hue service.
    2. Back up the Hue database.
    3. Download the history cleanup script to the host running the Hue Server.
    4. Run the following as root:
      • parcel installation
        export HUE_CONF_DIR="/var/run/cloudera-scm-agent/process/`ls -1 /var/run/cloudera-scm-agent/process | grep HUE| sort -n | tail -1 `"
        /opt/cloudera/parcels/CDH/share/hue/build/env/bin/hue shell
      • package installation
        export HUE_CONF_DIR="/var/run/cloudera-scm-agent/process/`ls -1 /var/run/cloudera-scm-agent/process | grep HUE| sort -n | tail -1 `"
        /usr/share/hue/build/env/bin/hue shell
    5. Run the downloaded script in the Hue shell.
  • If Using MySQL as Hue Backend: You may face issues after the upgrade if the default engine for MySQL does not match the engine used by the Hue tables. To confirm the match:
    1. Open the my.cnf file for MySQL, search for "default-storage-engine" and note its value.
    2. Connect to MySQL and run the following commands:
      use hue;
      show create table auth_user;
    3. Search for the "ENGINE=" line and confirm that its value matches the one for the "default-storage-engine" above.

      If the default engines do not match, Hue will display a warning on its start-up page (http://$HUE_HOST:$HUE_PORT/about). Work with your database administrator to convert the current Hue MySQL tables to the engine in use by MySQL, as noted by the "default-storage-engine" property.

  • Whenever upgrading Impala, whether in CDH or a standalone parcel or package, check your SQL against the newest reserved words listed in incompatible changes. If upgrading across multiple versions or in case of any problems, check against the full list of Impala keywords.
  • Run the Host Inspector and fix every issue.
  • If using security, run the The Security Inspector.
  • Run hdfs fsck / and hdfs dfsadmin -report and fix every issue.
  • If using HBase:
    • Run hbase hbck.
    • Before you can upgrade HBase from CDH 4 to CDH 5, your HFiles must be upgraded from HFile v1 format to HFile v2, because CDH 5 no longer supports HFile v1. The upgrade procedure itself is different if you are using Cloudera Manager or the command line, but has the same results. The first step is to check for instances of HFile v1 in the HFiles and mark them to be upgraded to HFile v2, and to check for and report about corrupted files or files with unknown versions, which need to be removed manually. The next step is to rewrite the HFiles during the next major compaction. After the HFiles are upgraded, you can continue the upgrade. After the upgrade is complete, you must recompile custom coprocessors and JARs.To check and upgrade the files:
      1. In the Cloudera Admin Console, go to the HBase service and run Actions > Check HFile Version.
      2. Check the output of the command in the stderr log.
        Your output should be similar to the following:
        Tables Processed:
        hdfs://localhost:41020/myHBase/.META.
        hdfs://localhost:41020/myHBase/usertable
        hdfs://localhost:41020/myHBase/TestTable
        hdfs://localhost:41020/myHBase/t
        
        Count of HFileV1: 2
        HFileV1:
        hdfs://localhost:41020/myHBase/usertable /fa02dac1f38d03577bd0f7e666f12812/family/249450144068442524
        hdfs://localhost:41020/myHBase/usertable /ecdd3eaee2d2fcf8184ac025555bb2af/family/249450144068442512
        
        Count of corrupted files: 1
        Corrupted Files:
        hdfs://localhost:41020/myHBase/usertable/fa02dac1f38d03577bd0f7e666f12812/family/1
        Count of Regions with HFileV1: 2
        Regions to Major Compact:
        hdfs://localhost:41020/myHBase/usertable/fa02dac1f38d03577bd0f7e666f12812
        hdfs://localhost:41020/myHBase/usertable/ecdd3eaee2d2fcf8184ac025555bb2af
        In the example above, you can see that the script has detected two HFile v1 files, one corrupt file and the regions to major compact.
      3. Trigger a major compaction on each of the reported regions. This major compaction rewrites the files from HFile v1 to HFile v2 format. To run the major compaction, start HBase Shell and issue the major_compact command.
        $ /usr/lib/hbase/bin/hbase shell
        hbase> major_compact 'usertable'
        You can also do this in a single step by using the echo shell built-in command.
        $ echo "major_compact 'usertable'" | /usr/lib/hbase/bin/hbase shell
  • Review the upgrade procedure and reserve a maintenance window with enough time allotted to perform all steps. For production clusters, Cloudera recommends allocating up to a full day maintenance window to perform the upgrade, depending on the number of hosts, the amount of experience you have with Hadoop and Linux, and the particular hardware you are using.
  • To avoid lots of alerts during the upgrade process, you can enable maintenance mode on your cluster before you start the upgrade. This will stop email alerts and SNMP traps from being sent, but will not stop checks and configuration validations from being made. Be sure to exit maintenance mode when you have finished the upgrade to re-enable Cloudera Manager alerts.
Page generated July 8, 2016.