This is the documentation for Cloudera Enterprise 5.8.x. Documentation for other versions is available at Cloudera Documentation.

Upgrading Hive

Upgrade Hive on all the hosts on which it is running: servers and clients.

  Warning: Because of concurrency and security issues, HiveServer1 and the Hive CLI is deprecated in CDH 5 and will be removed in a future release. Cloudera recommends you migrate to Beeline and HiveServer2 as soon as possible. The Hive CLI is not needed if you are using Beeline with HiveServer2.
  Note: To see which version of Hive is shipping in CDH 5, check the Version and Packaging Information. For important information on new and changed components, see the CDH 5 Release Notes.

Checklist to Help Ensure Smooth Upgrades

The following best practices for configuring and maintaining Hive will help ensure that upgrades go smoothly.
  • Configure periodic backups of the metastore database. Use mysqldump, or the equivalent for your vendor if you are not using MySQL.
  • Make sure datanucleus.autoCreateSchema is set to false (in all types of database) and datanucleus.fixedDatastore is set to true (for MySQL and Oracle) in all hive-site.xml files. See the configuration instructions for more information about setting the properties in hive-site.xml.

  • Insulate the metastore database from users by running the metastore service in Remote mode. If you do not follow this recommendation, make sure you remove DROP, ALTER, and CREATE privileges from the Hive user configured in hive-site.xml. See Configuring the Hive Metastore for complete instructions for each type of supported database.
  Warning:

Make sure you have read and understood all incompatible changes and known issues before you upgrade Hive.

Upgrading Hive from CDH 4 to CDH 5

  Note:

If you have already performed the steps to uninstall CDH 4 and all components, as described under Upgrading from CDH 4 to CDH 5, you can skip Step 1 below and proceed with installing the new CDH 5 version of Hive.

Step 1: Remove Hive

  Warning:

You must make sure no Hive processes are running. If Hive processes are running during the upgrade, the new version will not work correctly.

  1. Exit the Hive console and make sure no Hive scripts are running.
  2. Stop any HiveServer processes that are running. If HiveServer is running as a daemon, use the following command to stop it:
    $ sudo service hive-server stop

    If HiveServer is running from the command line, stop it with <CTRL>-c.

  3. Stop the metastore. If the metastore is running as a daemon, use the following command to stop it:
    $ sudo service hive-metastore stop

    If the metastore is running from the command line, stop it with <CTRL>-c.

  4. Remove Hive:
    $ sudo yum remove hive

    To remove Hive on SLES systems:

    $ sudo zypper remove hive

    To remove Hive on Ubuntu and Debian systems:

    $ sudo apt-get remove hive

Step 2: Install the new Hive version on all hosts (Hive servers and clients)

See Installing Hive.

  Important: Configuration files
  • If you install a newer version of a package that is already on the system, configuration files that you have modified will remain intact.
  • If you uninstall a package, the package manager renames any configuration files you have modified from <file> to <file>.rpmsave. If you then re-install the package (probably to install a new version) the package manager creates a new <file> with applicable defaults. You are responsible for applying any changes captured in the original configuration file to the new configuration file. In the case of Ubuntu and Debian upgrades, you will be prompted if you have made changes to a file for which there is a new version. For details, see Automatic handling of configuration files by dpkg.

Step 3: Configure the Hive Metastore

You must configure the Hive metastore and initialize the service before you can use Hive. See Configuring the Hive Metastore for detailed instructions.

Step 4: Upgrade the Metastore Schema

  Important:
  • Cloudera strongly encourages you to make a backup copy of your metastore database before running the upgrade scripts. You will need this backup copy if you run into problems during the upgrade or need to downgrade to a previous version.
  • You must upgrade the metastore schema to the version corresponding to the new version of Hive before starting Hive after the upgrade. Failure to do so may result in metastore corruption.
  • To run a script, you must first cd to the directory that script is in: that is /usr/lib/hive/scripts/metastore/upgrade/<database>.

As of CDH 5, there are now two ways to do this. You could either use Hive's schematool or use the schema upgrade scripts provided with the Hive package.

Using schematool (Recommended):

The Hive distribution includes an offline tool for Hive metastore schema manipulation called schematool. This tool can be used to initialize the metastore schema for the current Hive version. It can also upgrade the schema from an older version to the current one.

To upgrade the schema, use the upgradeSchemaFrom option to specify the version of the schema you are currently using (see table below) and the compulsory dbType option to specify the database you are using. The example that follows shows an upgrade from Hive 0.10.0 (CDH 4) for an installation using the Derby database.
$ schematool -dbType derby -upgradeSchemaFrom 0.10.0
Metastore connection URL:        jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver :    org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User:       APP
Starting upgrade metastore schema from version 0.10.0 to <new_version>
Upgrade script upgrade-0.10.0-to-0.11.0.derby.sql
Completed upgrade-0.10.0-to-0.11.0.derby.sql
Upgrade script upgrade-0.11.0-to-<new_version>.derby.sql
Completed upgrade-0.11.0-to-<new_version>.derby.sql
schemaTool completed

Possible values for the dbType option are mysql, postgres, derby or oracle. The following table lists the Hive versions corresponding to the older CDH releases.

CDH Releases Hive Version

CDH 3

0.7.0

CDH 4.0

0.8.0

CDH 4.1

0.9.0

CDH 4.2 and higher 4.x

0.10.0

CDH 5.0, 5.1 0.12.0
CDH 5.2 0.13.0

See Using the Hive Schema Tool for more details on how to use schematool.

Using Schema Upgrade Scripts:

Run the appropriate schema upgrade script(s); they are in /usr/lib/hive/scripts/metastore/upgrade/. Start with the script for your database and Hive version, and run all subsequent scripts.

For example, if you are currently running Hive 0.10 with MySQL, and upgrading to Hive 0.13.1, start with the script for Hive 0.10 to 0.11 for MySQL, then run the script for Hive 0.11 to 0.12 for MySQL, then run the script for Hive 0.12 to 0.13.1.

For more information about upgrading the schema, see the README in /usr/lib/hive/scripts/metastore/upgrade/.

Step 5: Configure HiveServer2

HiveServer2 is an improved version of the original HiveServer (HiveServer1, no longer supported). Some configuration is required before you initialize HiveServer2; see Configuring HiveServer2 for details.

Step 6: Upgrade Scripts for HiveServer2 (if necessary)

If you have been running HiveServer1, you may need to make some minor modifications to your client-side scripts and applications when you upgrade:

  • HiveServer1 does not support concurrent connections, so many customers run a dedicated instance of HiveServer1 for each client. These can now be replaced by a single instance of HiveServer2.
  • HiveServer2 uses a different connection URL and driver class for the JDBC driver. If you have existing scripts that use JDBC to communicate with HiveServer1, you can modify these scripts to work with HiveServer2 by changing the JDBC driver URL from jdbc:hive://hostname:port to jdbc:hive2://hostname:port, and by changing the JDBC driver class name from org.apache.hive.jdbc.HiveDriver to org.apache.hive.jdbc.HiveDriver.

Step 7: Start the Metastore, HiveServer2, and Beeline

See:

Step 8: Upgrade the JDBC driver on the clients

The driver used for CDH 4.x does not work with CDH 5.x. Install the new version, following these instructions.

Upgrading Hive from a Lower Version of CDH 5

The instructions that follow assume that you are upgrading Hive as part of a CDH 5 upgrade, and have already performed the steps under Upgrading from an Earlier CDH 5 Release to the Latest Release.

  Important:
  • If you are currently running Hive under MRv1, check for the following property and value in /etc/mapred/conf/mapred-site.xml:
    <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
    </property> 
    Remove this property before you proceed; otherwise Hive queries spawned from MapReduce jobs will fail with a null pointer exception (NPE).
  • If you have installed the hive-hcatalog-server package in the past, you must remove it before you proceed; otherwise the upgrade will fail.
  • If you are upgrading Hive from CDH 5.0.5 to CDH 5.4, 5.3 or 5.2 on Debian 7.0, and a Sentry version higher than 5.0.4 and lower than 5.1.0 is installed, you must upgrade Sentry before upgrading Hive; otherwise the upgrade will fail. See Apache Hive Known Issues for more details.
  • CDH 5.2 and higher clients cannot communicate with CDH 5.1 and lower servers. This means that you must upgrade the server before the clients.

To upgrade Hive from a lower version of CDH 5, proceed as follows.

Step 1: Stop all Hive Processes and Daemons

  Warning:

You must make sure no Hive processes are running. If Hive processes are running during the upgrade, the new version will not work correctly.

  1. Stop any HiveServer processes that are running:
    $ sudo service hive-server stop 
  2. Stop any HiveServer2 processes that are running:
    $ sudo service hive-server2 stop 
  3. Stop the metastore:
    $ sudo service hive-metastore stop 

Step 2: Install the new Hive version on all hosts (Hive servers and clients)

SeeInstalling Hive

Step 3: Verify that the Hive Metastore is Properly Configured

See Configuring the Hive Metastore for detailed instructions.

Step 4: Upgrade the Metastore Schema

  Important:
  • Cloudera strongly encourages you to make a backup copy of your metastore database before running the upgrade scripts. You will need this backup copy if you run into problems during the upgrade or need to downgrade to a previous version.
  • You must upgrade the metastore schema to the version corresponding to the new version of Hive before starting Hive after the upgrade. Failure to do so may result in metastore corruption.
  • To run a script, you must first cd to the directory that script is in: that is /usr/lib/hive/scripts/metastore/upgrade/<database>.

As of CDH 5, there are now two ways to do this. You could either use Hive's schematool or use the schema upgrade scripts provided with the Hive package.

Using schematool (Recommended):

The Hive distribution includes an offline tool for Hive metastore schema manipulation called schematool. This tool can be used to initialize the metastore schema for the current Hive version. It can also upgrade the schema from an older version to the current one.

To upgrade the schema, use the upgradeSchemaFrom option to specify the version of the schema you are currently using (see table below) and the compulsory dbType option to specify the database you are using. The example that follows shows an upgrade from Hive 0.10.0 (CDH 4) for an installation using the Derby database.
$ schematool -dbType derby -upgradeSchemaFrom 0.10.0
Metastore connection URL:        jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver :    org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User:       APP
Starting upgrade metastore schema from version 0.10.0 to <new_version>
Upgrade script upgrade-0.10.0-to-0.11.0.derby.sql
Completed upgrade-0.10.0-to-0.11.0.derby.sql
Upgrade script upgrade-0.11.0-to-<new_version>.derby.sql
Completed upgrade-0.11.0-to-<new_version>.derby.sql
schemaTool completed

Possible values for the dbType option are mysql, postgres, derby or oracle. The following table lists the Hive versions corresponding to the older CDH releases.

CDH Releases Hive Version

CDH 3

0.7.0

CDH 4.0

0.8.0

CDH 4.1

0.9.0

CDH 4.2 and higher 4.x

0.10.0

CDH 5.0, 5.1 0.12.0
CDH 5.2 0.13.0

See Using the Hive Schema Tool for more details on how to use schematool.

Using Schema Upgrade Scripts:

Run the appropriate schema upgrade script(s); they are in /usr/lib/hive/scripts/metastore/upgrade/. Start with the script for your database and Hive version, and run all subsequent scripts.

For example, if you are currently running Hive 0.10 with MySQL, and upgrading to Hive 0.13.1, start with the script for Hive 0.10 to 0.11 for MySQL, then run the script for Hive 0.11 to 0.12 for MySQL, then run the script for Hive 0.12 to 0.13.1.

For more information about upgrading the schema, see the README in /usr/lib/hive/scripts/metastore/upgrade/.

Step 5: Start the Metastore, HiveServer2, and Beeline

See:

The upgrade is now complete.

Troubleshooting: if you failed to upgrade the metastore

If you failed to upgrade the metastore as instructed above, proceed as follows.

  1. Identify the problem.
    The symptoms are as follows:
    • Hive stops accepting queries.
    • In a cluster managed by Cloudera Manager, the Hive Metastore canary fails.
    • An error such as the following appears in the Hive Metastore Server logs:
      Hive Schema version 0.13.0 does not match metastore's schema version 0.12.0 Metastore is not upgraded or corrupt.
  2. Resolve the problem.
    If the problem you are having matches the symptoms just described, do the following:
    1. Stop all Hive services; for example:
      $ sudo service hive-server2 stop
      $ sudo service hive-metastore stop
    2. Run the Hive schematool, as instructed here.
      Make sure the value you use for the -upgradeSchemaFrom option matches the version you are currently running (not the new version). For example, if the error message in the log is
      Hive Schema version 0.13.0 does not match metastore's schema version 0.12.0 Metastore is not upgraded or corrupt.
      then the value of -upgradeSchemaFrom must be 0.12.0.
    3. Restart the Hive services you stopped.
Page generated July 8, 2016.