This is the documentation for Cloudera Enterprise 5.8.x. Documentation for other versions is available at Cloudera Documentation.

Upgrading to CDH 5.8

Use the instructions in this section to upgrade from an earlier version of CDH 5 to CDH 5.8.

If you use parcels, have a Cloudera Enterprise license, and have enabled HDFS high availability, you can perform a rolling upgrade that lets you avoid cluster downtime.

To upgrade from CDH 4 to CDH 5, use the instructions under Upgrading CDH 4 to CDH 5.

Before You Begin

  • Read the CDH 5 Release Notes.
  • Read the Cloudera Manager 5 Release Notes.
  • Ensure Java 1.7 or 1.8 is installed across the cluster. For installation instructions and recommendations, see Upgrading to Oracle JDK 1.7 in a Cloudera Manager Deployment or Upgrading to Oracle JDK 1.8, and make sure you have read Known Issues and Workarounds in Cloudera Manager 5 before you proceed with the upgrade.
  • Ensure that the Cloudera Manager minor version is equal to or greater than the CDH minor version. For example:
    Target CDH Version Minimum Cloudera Manager Version
    5.0.5 5.0.x
    5.1.4 5.1.x
    5.4.1 5.4.x
  • Date partition columns: as of Hive version 13, implemented in CDH 5.2, Hive validates the format of dates in partition columns, if they are stored as dates. A partition column with a date in invalid form can neither be used nor dropped once you upgrade to CDH 5.2 or higher. To avoid this problem, do one of the following:
    • Fix any invalid dates before you upgrade. Hive expects dates in partition columns to be in the form YYYY-MM-DD.
    • Store dates in partition columns as strings or integers.
    You can use the following SQL query to find any partition-column values stored as dates:
    SELECT "DBS"."NAME", "TBLS"."TBL_NAME", "PARTITION_KEY_VALS"."PART_KEY_VAL"
    FROM "PARTITION_KEY_VALS"
      INNER JOIN "PARTITIONS" ON "PARTITION_KEY_VALS"."PART_ID" = "PARTITIONS"."PART_ID"
      INNER JOIN "PARTITION_KEYS" ON "PARTITION_KEYS"."TBL_ID" = "PARTITIONS"."TBL_ID"
      INNER JOIN "TBLS" ON "TBLS"."TBL_ID" = "PARTITIONS"."TBL_ID"
      INNER JOIN "DBS" ON "DBS"."DB_ID" = "TBLS"."DB_ID"
        AND "PARTITION_KEYS"."INTEGER_IDX" ="PARTITION_KEY_VALS"."INTEGER_IDX"
        AND "PARTITION_KEYS"."PKEY_TYPE" = 'date';
  • Whenever upgrading Impala, whether in CDH or a standalone parcel or package, check your SQL against the newest reserved words listed in incompatible changes. If upgrading across multiple versions or in case of any problems, check against the full list of Impala keywords.
  • Run the Host Inspector and fix every issue.
  • If using security, run the The Security Inspector.
  • Run hdfs fsck / and hdfs dfsadmin -report and fix every issue.
  • Run hbase hbck.
  • Review the upgrade procedure and reserve a maintenance window with enough time allotted to perform all steps. For production clusters, Cloudera recommends allocating up to a full day maintenance window to perform the upgrade, depending on the number of hosts, the amount of experience you have with Hadoop and Linux, and the particular hardware you are using.
  • To avoid lots of alerts during the upgrade process, you can enable maintenance mode on your cluster before you start the upgrade. This will stop email alerts and SNMP traps from being sent, but will not stop checks and configuration validations from being made. Be sure to exit maintenance mode when you have finished the upgrade to re-enable Cloudera Manager alerts.
  • Hue validates CA certificates and needs a truststore. To create one, follow the instructions in Hue as a TLS/SSL Client.
Page generated July 8, 2016.