This is the documentation for Cloudera Enterprise 5.8.x. Documentation for other versions is available at Cloudera Documentation.

Using the Hive Schema Tool

Schema Version Verification

Hive now records the schema version in the metastore database and verifies that the metastore schema version is compatible with the Hive binaries that are going to access the metastore. The Hive properties to implicitly create or alter the existing schema are disabled by default. Hence, Hive will not attempt to change the metastore schema implicitly. When you execute a Hive query against an old schema, it will fail to access the metastore displaying an error message as follows:

$ build/dist/bin/hive -e "show tables"
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

The error log will contain an entry similar to the following:

...
Caused by: MetaException(message:Version information not found in metastore. )
        at org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:5638)
...

To suppress the schema check and allow the metastore to implicitly modify the schema, you need to set the hive.metastore.schema.verification configuration property to false in hive-site.xml.

Using schematool

The Hive distribution now includes an offline tool for Hive metastore schema manipulation called schematool. This tool can be used to initialize the metastore schema for the current Hive version. It can also handle upgrading schema from an older version to the current one. The tool will try to find the current schema from the metastore if available. However, this will be applicable only to any future upgrades. In case you are upgrading from existing CDH releases like CDH 4 or CDH 3, you should specify the schema version of the existing metastore as a command line option to the tool.

The schematool figures out the SQL scripts required to initialize or upgrade the schema and then executes those scripts against the backend database. The metastore database connection information such as JDBC URL, JDBC driver and database credentials are extracted from the Hive configuration. You can provide alternate database credentials if needed.

The following options are available as part of the schematool package.
$ schematool -help
usage: schemaTool
 -dbType <databaseType>             Metastore database type                 
 -dryRun                            List SQL scripts (no execute)                                       
 -help                              Print this message
 -info                              Show config and schema details
 -initSchema                        Schema initialization
 -initSchemaTo <initTo>             Schema initialization to a version
 -passWord <password>               Override config file password
 -upgradeSchema                     Schema upgrade
 -upgradeSchemaFrom <upgradeFrom>   Schema upgrade from a version
 -userName <user>                   Override config file user name
 -verbose                           Only print SQL statements
The dbType option should always be specified and can be one of the following:
 derby|mysql|postgres|oracle 

Usage Examples

  • Initialize your metastore to the current schema for a new Hive setup using the initSchema option.
    $ schematool -dbType derby -initSchema
    Metastore connection URL:        jdbc:derby:;databaseName=metastore_db;create=true
    Metastore Connection Driver :    org.apache.derby.jdbc.EmbeddedDriver
    Metastore connection User:       APP
    Starting metastore schema initialization to <new_version>
    Initialization script hive-schema-<new_version>.derby.sql
    Initialization script completed
    schemaTool completed
    
  • Get schema information using the info option.
    $ schematool -dbType derby -info
    Metastore connection URL:        jdbc:derby:;databaseName=metastore_db;create=true
    Metastore Connection Driver :    org.apache.derby.jdbc.EmbeddedDriver
    Metastore connection User:       APP
    Hive distribution version:       <new_version>
    Required schema version:         <new_version>
    Metastore schema version:        <new_version>
    schemaTool completed
    
  • If you attempt to get schema information from older metastores that did not store version information, the tool will report an error as follows.
    $ schematool -dbType derby -info
    Metastore connection URL:        jdbc:derby:;databaseName=metastore_db;create=true
    Metastore Connection Driver :    org.apache.derby.jdbc.EmbeddedDriver
    Metastore connection User:       APP
    Hive distribution version:       <new_version>
    Required schema version:         <new_version>
    org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.
    *** schemaTool failed ***
    
  • You can upgrade schema from a CDH 4 release by specifying the upgradeSchemaFrom option.
    $ schematool -dbType derby -upgradeSchemaFrom 0.10.0
    Metastore connection URL:        jdbc:derby:;databaseName=metastore_db;create=true
    Metastore Connection Driver :    org.apache.derby.jdbc.EmbeddedDriver
    Metastore connection User:       APP
    Starting upgrade metastore schema from version 0.10.0 to <new_version>
    Upgrade script upgrade-0.10.0-to-<new_version>.derby.sql
    Completed upgrade-0.10.0-to-<new_version>.derby.sql
    Upgrade script upgrade-0.11.0-to-<new_version>.derby.sql
    Completed upgrade-0.11.0-to-<new_version>.derby.sql
    schemaTool completed
    
    The Hive versions of the older CDH releases are:
    CDH Releases Hive Version

    CDH 3

    0.7.0

    CDH 4.0

    0.8.0

    CDH 4.1

    0.9.0

    CDH 4.2 and higher

    0.10.0

  • If you want to find out all the required scripts for a schema upgrade, use the dryRun option.
    $ build/dist/bin/schematool -dbType derby -upgradeSchemaFrom 0.7.0 -dryRun
    13/09/27 17:06:31 WARN conf.Configuration: hive.server2.enable.impersonation is deprecated. Instead, use hive.server2.enable.doAs
    Metastore connection URL:        jdbc:derby:;databaseName=metastore_db;create=true
    Metastore Connection Driver :    org.apache.derby.jdbc.EmbeddedDriver
    Metastore connection User:       APP
    Starting upgrade metastore schema from version 0.7.0 to <new_version>
    Upgrade script upgrade-0.7.0-to-0.8.0.derby.sql
    Upgrade script upgrade-0.8.0-to-0.9.0.derby.sql
    Upgrade script upgrade-0.9.0-to-0.10.0.derby.sql
    Upgrade script upgrade-0.10.0-to-0.11.0.derby.sql
    Upgrade script upgrade-0.11.0-to-<new_version>.derby.sql
    schemaTool completed
Page generated July 8, 2016.