Configuring Oozie
This page explains how to configure Oozie, for new installs and upgrades, in an unmanaged deployment, without Cloudera Manager.
- If you use Cloudera Manager, do not use these command-line instructions.
- This information applies specifically to CDH 5.8.x. If you use a lower version of CDH, see the documentation for that version located at Cloudera Documentation.
Continue reading:
Configuring which Hadoop Version to Use
The Oozie server works with either MRv1 or YARN, but not both simultaneously. The Oozie client does not interact directly with Hadoop MapReduce and does not require any MapReduce configuration.
To configure the Oozie server to work with YARN or MRv1, and with or without TLS/SSL, use the alternatives command (or update-alternatives, depending on your operating system).
- To use YARN (without TLS/SSL):
alternatives --set oozie-tomcat-conf /etc/oozie/tomcat-conf.http
- To use YARN (with TLS/SSL):
alternatives --set oozie-tomcat-conf /etc/oozie/tomcat-conf.https
- To use MRv1 (without TLS/SSL) :
alternatives --set oozie-tomcat-conf /etc/oozie/tomcat-conf.http.mr1
- To use MRv1 (with TLS/SSL) :
alternatives --set oozie-tomcat-conf /etc/oozie/tomcat-conf.https.mr1
export CATALINA_BASE=/var/lib/oozie/tomcat-deployment
Configuring Oozie after Upgrading from CDH 4
Step 1: Update Configuration Files
- Edit the new Oozie CDH 5 oozie-site.xml, and set all customizable properties to the values you set in the CDH 4 oozie-site.xml:
Important: Do not copy over the CDH 4 configuration files into the CDH 5 configuration directory.
- If necessary do the same for the oozie-log4j.properties, oozie-env.sh and the adminusers.txt files.
Step 2: Upgrade the Database
- Do not proceed before you have edited the configuration files as instructed in Step 1.
- Before running the database upgrade tool, copy or symbolically link the JDBC driver JAR for the database you are using into the /var/lib/oozie/ directory.
Oozie CDH 5 provides a command-line tool to perform the database schema and data upgrade that is required when you upgrade Oozie from CDH 4 to CDH 5. The tool uses Oozie configuration files to connect to the database and perform the upgrade.
- To run the Oozie database upgrade tool against the database:
Important: This step must be done as the oozie Unix user, otherwise Oozie may fail to start or work properly because of incorrect file permissions.
$ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh upgrade -run
You will see output such as this (the output of the script may differ slightly depending on the database vendor):Validate DB Connection DONE Check DB schema exists DONE Verify there are not active Workflow Jobs DONE Check OOZIE_SYS table does not exist DONE Get Oozie DB version DONE Upgrade SQL schema DONE Upgrading to db schema for Oozie 4.0 Update db.version in OOZIE_SYS table to 2 DONE Post-upgrade COORD_JOBS new columns default values DONE Post-upgrade COORD_JOBS & COORD_ACTIONS status values DONE Post-upgrade MISSING_DEPENDENCIES column in Derby DONE Table 'WF_ACTIONS' column 'execution_path', length changed to 1024 Table 'WF_ACTIONS, column 'error_message', changed to varchar/varchar2 Table 'COORD_JOB' column 'frequency' changed to varchar/varchar2 DONE Post-upgrade BUNDLE_JOBS, COORD_JOBS, WF_JOBS to drop AUTH_TOKEN column DONE Upgrading to db schema for Oozie 4.0.0-cdh5.0.0 Update db.version in OOZIE_SYS table to 3 DONE Dropping discriminator column DONE Oozie DB has been upgraded to Oozie version '4.0.0-cdh5.0.0’ The SQL commands have been written to: /tmp/ooziedb-3809837539907706.sql
- To create the upgrade script:
Important:
This step must be done as the oozie Unix user, otherwise Oozie may fail to start or work properly because of incorrect file permissions.
$ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh upgrade -sqlfile SCRIPT
For example:$ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh upgrade -sqlfile oozie-upgrade.sql
You should see output such as the following (the output of the script may differ slightly depending on the database vendor):Validate DB Connection DONE Check DB schema exists DONE Verify there are not active Workflow Jobs DONE Check OOZIE_SYS table does not exist DONE Get Oozie DB version DONE Upgrade SQL schema DONE Upgrading to db schema for Oozie 4.0 Update db.version in OOZIE_SYS table to 2 DONE Post-upgrade COORD_JOBS new columns default values DONE Post-upgrade COORD_JOBS & COORD_ACTIONS status values DONE Post-upgrade MISSING_DEPENDENCIES column in Derby DONE Table 'WF_ACTIONS' column 'execution_path', length changed to 1024 Table 'WF_ACTIONS, column 'error_message', changed to varchar/varchar2 Table 'COORD_JOB' column 'frequency' changed to varchar/varchar2 DONE Post-upgrade BUNDLE_JOBS, COORD_JOBS, WF_JOBS to drop AUTH_TOKEN column DONE Upgrading to db schema for Oozie 4.0.0-cdh5.0.0 Update db.version in OOZIE_SYS table to 3 DONE Dropping discriminator column DONE The SQL commands have been written to: oozie-upgrade.sql WARN: The SQL commands have NOT been executed, you must use the '-run' option
Important: If you used the -sqlfile option instead of -run, Oozie database schema has not been upgraded. You need to run the oozie-upgrade script against your database.
Step 3: Upgrade the Oozie Shared Library
CDH 5 Oozie has a new shared library that bundles CDH 5 JAR files for streaming, DistCp and for Pig, Hive, HiveServer 2, Sqoop, and HCatalog.
- The shared library file for YARN is oozie-sharelib-yarn.
- The shared library file for MRv1 is oozie-sharelib-mr1.
- Install the Oozie CDH 5 shared libraries. For example:
$ sudo oozie-setup sharelib create -fs FS_URI -locallib /usr/lib/oozie/oozie-sharelib-yarn
where FS_URI is the HDFS URI of the filesystem that the shared library should be installed on (for example, hdfs://HOST:PORT).
Important: If you are installing Oozie to work with MRv1, make sure you use oozie-sharelib-mr1 instead.
Step 4: Start the Oozie Server
Now you can start Oozie:
$ sudo service oozie start
Check Oozie's oozie.log to verify that Oozie has started successfully.
Step 5: Upgrade the Oozie Client
Although older Oozie clients work with the new Oozie server, you need to install the new version of the Oozie client to use all the functionality of the Oozie server.
To upgrade the Oozie client, if you have not already done so, follow the steps under Installing Oozie.
Configuring Oozie after Upgrading from an Earlier CDH 5 Release
Step 1: Update Configuration Files
- Edit the new Oozie CDH 5 oozie-site.xml, and set all customizable properties to the values you set in the previous oozie-site.xml.
- If necessary do the same for the oozie-log4j.properties, oozie-env.sh and the adminusers.txt files.
Step 2: Upgrade the Database
- Do not proceed before you have edited the configuration files as instructed in Step 1.
- Before running the database upgrade tool, copy or symbolically link the JDBC driver JAR for the database you are using into the /var/lib/oozie/ directory.
Oozie CDH 5 provides a command-line tool to perform the database schema and data upgrade. The tool uses Oozie configuration files to connect to the database and perform the upgrade.
The database upgrade tool works in two modes: it can do the upgrade in the database or it can produce an SQL script that a database administrator can run manually. If you use the tool to perform the upgrade, you must do it as a database user who has permissions to run DDL operations in the Oozie database.
- To run the Oozie database upgrade tool against the database:
Important:
This step must be done as the oozie Unix user, otherwise Oozie may fail to start or work properly because of incorrect file permissions.
$ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh upgrade -run
You will see output such as this (the output of the script may differ slightly depending on the database vendor):Validate DB Connection DONE Check DB schema exists DONE Verify there are not active Workflow Jobs DONE Check OOZIE_SYS table does not exist DONE Get Oozie DB version DONE Upgrade SQL schema DONE Upgrading to db schema for Oozie 4.0.0-cdh5.0.0 Update db.version in OOZIE_SYS table to 3 DONE Converting text columns to bytea for all tables DONE Get Oozie DB version DONE Oozie DB has been upgraded to Oozie version '4.0.0-cdh5.0.0' The SQL commands have been written to: /tmp/ooziedb-8676029205446760413.sql
- To create the upgrade script:
Important: This step must be done as the oozie Unix user, otherwise Oozie may fail to start or work properly because of incorrect file permissions.
$ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh upgrade -sqlfile SCRIPT
For example:$ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh upgrade -sqlfile oozie-upgrade.sql
You should see output such as the following (the output of the script may differ slightly depending on the database vendor):Validate DB Connection DONE Check DB schema exists DONE Verify there are not active Workflow Jobs DONE Check OOZIE_SYS table does not exist DONE Get Oozie DB version DONE Upgrade SQL schema DONE Upgrading to db schema for Oozie 4.0.0-cdh5.0.0 Update db.version in OOZIE_SYS table to 3 DONE Converting text columns to bytea for all tables DONE Get Oozie DB version DONE The SQL commands have been written to: oozie-upgrade.sql WARN: The SQL commands have NOT been executed, you must use the '-run' option
Important: If you used the -sqlfile option instead of -run, Oozie database schema has not been upgraded. You need to run the oozie-upgrade script against your database.
Step 3: Upgrade the Oozie Shared Library
The Oozie installation bundles two shared libraries, one for MRv1 and one for YARN. Make sure you install the right one for the MapReduce version you are using:
- The shared library file for YARN is oozie-sharelib-yarn.
- The shared library file for MRv1 is oozie-sharelib-mr1.
To upgrade the shared library, proceed as follows.
- Delete the Oozie shared libraries from HDFS. For example:
$ sudo -u oozie hadoop fs -rmr /user/oozie/share
Note:- If Kerberos is enabled, do not use commands in the form sudo -u <user> <command>; they will fail with a security error. Instead, use the following commands: $ kinit <user> (if you are using a password) or $ kinit -kt <keytab> <principal> (if you are using a keytab) and then, for each command executed by this user, $ <command>
- If the current shared libraries are in another location, make sure you use this other location when you run the above command(s).
- install the Oozie CDH 5 shared libraries. For example:
$ sudo oozie-setup sharelib create -fs <FS_URI> -locallib /usr/lib/oozie/oozie-sharelib-yarn
where FS_URI is the HDFS URI of the filesystem that the shared library should be installed on (for example, hdfs://<HOST>:<PORT>).
Important: If you are installing Oozie to work with MRv1, make sure you use oozie-sharelib-mr1 instead.
Step 4: Start the Oozie Server
$ sudo service oozie start
Check Oozie's oozie.log to verify that Oozie has started successfully.
Step 5: Upgrade the Oozie Client
Although older Oozie clients work with the new Oozie server, you need to install the new version of the Oozie client to use all the functionality of the Oozie server.
To upgrade the Oozie client, if you have not already done so, follow the steps under Installing Oozie.
Configuring Oozie after a New Installation
When you install Oozie from an RPM or Debian package, Oozie server creates all configuration, documentation, and runtime files in the standard Linux directories, as follows.
Type of File | Where Installed |
---|---|
binaries |
/usr/lib/oozie/ |
configuration |
/etc/oozie/conf/ |
documentation |
for SLES: /usr/share/doc/packages/oozie/ for other platforms: /usr/share/doc/oozie/ |
examples TAR.GZ |
for SLES: /usr/share/doc/packages/oozie/ for other platforms: /usr/share/doc/oozie/ |
sharelib TAR.GZ |
/usr/lib/oozie/ |
data |
/var/lib/oozie/ |
logs |
/var/log/oozie/ |
temp |
/var/tmp/oozie/ |
PID file |
/var/run/oozie/ |
Deciding Which Database to Use
- Derby runs in embedded mode and it is not possible to monitor its health.
- Though it might be possible, Cloudera currently has no live backup strategy for the embedded Derby database.
- Under load, Cloudera has observed locks and rollbacks with the embedded Derby database that do not happen with server-based databases.
Configuring Oozie to Use PostgreSQL
Use the procedure that follows to configure Oozie to use PostgreSQL instead of Apache Derby.
- Install PostgreSQL 8.4.x or 9.0.x.
- Create the Oozie User and Oozie Database
- Configure PostgreSQL to Accept Network Connections for the Oozie User
- Reload the PostgreSQL Configuration
- Configure Oozie to Use PostgreSQL
Install PostgreSQL 8.4.x or 9.0.x.
Create the Oozie User and Oozie Database
For example, using the PostgreSQL psql command-line tool:
$ psql -U postgres Password for user postgres: ***** postgres=# CREATE ROLE oozie LOGIN ENCRYPTED PASSWORD 'oozie' NOSUPERUSER INHERIT CREATEDB NOCREATEROLE; CREATE ROLE postgres=# CREATE DATABASE "oozie" WITH OWNER = oozie ENCODING = 'UTF8' TABLESPACE = pg_default LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8' CONNECTION LIMIT = -1; CREATE DATABASE postgres=# \q
Configure PostgreSQL to Accept Network Connections for the Oozie User
- Edit the postgresql.conf file and set the listen_addresses property to *, to make sure that the PostgreSQL server starts listening on all your network interfaces. Also make sure that the standard_conforming_strings property is set to off.
- Edit the PostgreSQL data/pg_hba.conf file as follows:
host oozie oozie 0.0.0.0/0 md5
Reload the PostgreSQL Configuration
$ sudo -u postgres pg_ctl reload -s -D /opt/PostgreSQL/8.4/data
Configure Oozie to Use PostgreSQL
Edit the oozie-site.xml file as follows:
... <property> <name>oozie.service.JPAService.jdbc.driver</name> <value>org.postgresql.Driver</value> </property> <property> <name>oozie.service.JPAService.jdbc.url</name> <value>jdbc:postgresql://localhost:5432/oozie</value> </property> <property> <name>oozie.service.JPAService.jdbc.username</name> <value>oozie</value> </property> <property> <name>oozie.service.JPAService.jdbc.password</name> <value>oozie</value> </property> ...
Configuring Oozie to Use MariaDB
Continue reading:
- Install and Start MariaDB
- Create the Oozie Database and Oozie MariaDB User
- Configure Oozie to Use MariaDB
- Add the MariaDB JDBC Driver JAR to Oozie
Use the procedure that follows to configure Oozie to use MariaDB instead of Apache Derby.
Install and Start MariaDB
For more information, see Installing the MariaDB Server.
Create the Oozie Database and Oozie MariaDB User
For example, using the MariaDB mysql command-line tool:
$ mysql -u root -p Enter password: ****** mysql> create database oozie; Query OK, 1 row affected (0.03 sec) mysql> grant all privileges on oozie.* to 'oozie'@'localhost' identified by 'oozie'; Query OK, 0 rows affected (0.03 sec) mysql> grant all privileges on oozie.* to 'oozie'@'%' identified by 'oozie'; Query OK, 0 rows affected (0.03 sec) mysql> exit Bye
Configure Oozie to Use MariaDB
Edit properties in the oozie-site.xml file as follows:
... <property> <name>oozie.service.JPAService.jdbc.driver</name> <value>org.mysql.jdbc.Driver</value> </property> <property> <name>oozie.service.JPAService.jdbc.url</name> <value>jdbc:mysql://localhost:3306/oozie</value> </property> <property> <name>oozie.service.JPAService.jdbc.username</name> <value>oozie</value> </property> <property> <name>oozie.service.JPAService.jdbc.password</name> <value>oozie</value> </property> ...
Add the MariaDB JDBC Driver JAR to Oozie
Cloudera recommends that you use the MySQL JDBC driver for MariaDB. Copy or symbolically link the MySQL JDBC driver JAR to the /var/lib/oozie/ directory.
Configuring Oozie to Use MySQL
Use the procedure that follows to configure Oozie to use MySQL instead of Apache Derby.
- Install and Start MySQL 5.x
- Create the Oozie Database and Oozie MySQL User
- Configure Oozie to Use MySQL
- Add the MySQL JDBC Driver JAR to Oozie
Install and Start MySQL 5.x
Create the Oozie Database and Oozie MySQL User
For example, using the MySQL mysql command-line tool:
$ mysql -u root -p Enter password: ****** mysql> create database oozie; Query OK, 1 row affected (0.03 sec) mysql> grant all privileges on oozie.* to 'oozie'@'localhost' identified by 'oozie'; Query OK, 0 rows affected (0.03 sec) mysql> grant all privileges on oozie.* to 'oozie'@'%' identified by 'oozie'; Query OK, 0 rows affected (0.03 sec) mysql> exit Bye
Configure Oozie to Use MySQL
Edit properties in the oozie-site.xml file as follows:
... <property> <name>oozie.service.JPAService.jdbc.driver</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>oozie.service.JPAService.jdbc.url</name> <value>jdbc:mysql://localhost:3306/oozie</value> </property> <property> <name>oozie.service.JPAService.jdbc.username</name> <value>oozie</value> </property> <property> <name>oozie.service.JPAService.jdbc.password</name> <value>oozie</value> </property> ...
Add the MySQL JDBC Driver JAR to Oozie
- For installations that use packages: /var/lib/oozie/
- For installations that use parcels: /opt/cloudera/parcels/CDH/lib/oozie/lib/
Configuring Oozie to use Oracle
Use the procedure that follows to configure Oozie to use Oracle 11g instead of Apache Derby.
- Install and Start Oracle 11g
- Create the Oozie Oracle User and Grant Privileges
- Configure Oozie to Use Oracle
- Add the Oracle JDBC Driver JAR to Oozie
Install and Start Oracle 11g
Create the Oozie Oracle User and Grant Privileges
The following example uses the Oracle sqlplus command-line tool, and shows the privileges Cloudera recommends.
$ sqlplus system@localhost Enter password: ****** SQL> create user oozie identified by oozie default tablespace users temporary tablespace temp; User created. SQL> grant alter any index to oozie; grant alter any table to oozie; grant alter database link to oozie; grant create any index to oozie; grant create any sequence to oozie; grant create database link to oozie; grant create session to oozie; grant create table to oozie; grant drop any sequence to oozie; grant select any dictionary to oozie; grant drop any table to oozie; grant create procedure to oozie; grant create trigger to oozie; SQL> exit $
Do not make the following grant:
grant select any table;
Configure Oozie to Use Oracle
Edit the oozie-site.xml file as follows.
... <property> <name>oozie.service.JPAService.jdbc.driver</name> <value>oracle.jdbc.OracleDriver</value> </property> <property> <name>oozie.service.JPAService.jdbc.url</name> <value>jdbc:oracle:thin:@//myhost:1521/oozie</value> </property> <property> <name>oozie.service.JPAService.jdbc.username</name> <value>oozie</value> </property> <property> <name>oozie.service.JPAService.jdbc.password</name> <value>oozie</value> </property> ...
Add the Oracle JDBC Driver JAR to Oozie
Copy or symbolically link the Oracle JDBC driver JAR into the /var/lib/oozie/ directory.
Creating the Oozie Database Schema
The Oozie database tool works in 2 modes: it can create the database, or it can produce an SQL script that a database administrator can run to create the database manually. If you use the tool to create the database schema, you must have the permissions needed to execute DDL operations.
To run the Oozie database tool against the database
$ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh create -run
You should see output such as the following (the output of the script may differ slightly depending on the database vendor) :
Validate DB Connection. DONE Check DB schema does not exist DONE Check OOZIE_SYS table does not exist DONE Create SQL schema DONE DONE Create OOZIE_SYS table DONE Oozie DB has been created for Oozie version '4.0.0-cdh5.0.0' The SQL commands have been written to: /tmp/ooziedb-5737263881793872034.sql
To create the upgrade script
Run /usr/lib/oozie/bin/ooziedb.sh create -sqlfile SCRIPT. For example:
$ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh create -sqlfile oozie-create.sql
You should see output such as the following (the output of the script may differ slightly depending on the database vendor) :
Validate DB Connection. DONE Check DB schema does not exist DONE Check OOZIE_SYS table does not exist DONE Create SQL schema DONE DONE Create OOZIE_SYS table DONE Oozie DB has been created for Oozie version '4.0.0-cdh5.0.0' The SQL commands have been written to: oozie-create.sql WARN: The SQL commands have NOT been executed, you must use the '-run' option
Enabling the Oozie Web Console
To enable the Oozie web console, download and add the ExtJS library to the Oozie server.
Step 1: Download the Library
Download the ExtJS version 2.2 library from https://archive.cloudera.com/gplextras/misc/ext-2.2.zip and place it a convenient location.
Step 2: Install the Library
Extract the ext-2.2.zip file into /var/lib/oozie.
Step 3: Configure SPNEGO authentication (in Kerberos clusters only)
The web console shares a port with the Oozie REST API, and the API allows modifications of Oozie jobs (kill, submission, and inspection). SPNEGO authentication ensures that the Kerberos realm trusts the client browser credentials and that configuration of the client web browser passes these credentials. If this configuration is not possible, use the Hue Oozie Dashboard instead of the Oozie Web Console.
See Using a Web Browser to Access an URL Protected by Kerberos HTTP SPNEGO and Configuring a Cluster-dedicated MIT KDC with Cross-Realm Trust.
Configuring Oozie with Kerberos Security
To configure Oozie with Kerberos security, see Oozie Authentication.
Installing the Oozie Shared Library in Hadoop HDFS
The Oozie installation bundles the Oozie shared library, which contains all of the necessary JARs to enable workflow jobs to run streaming, DistCp, Pig, Hive, and Sqoop actions.
The Oozie installation bundles two shared libraries, one for MRv1 and one for YARN. Make sure you install the right one for the MapReduce version you are using:
- The shared library file for MRv1 is oozie-sharelib-mr1.
- The shared library file for YARN is oozie-sharelib-yarn.
To install the Oozie shared library in Hadoop HDFS in the oozie user home directory
$ sudo -u hdfs hadoop fs -mkdir /user/oozie $ sudo -u hdfs hadoop fs -chown oozie:oozie /user/oozie $ sudo oozie-setup sharelib create -fs <FS_URI> -locallib /usr/lib/oozie/oozie-sharelib-yarn
where FS_URI is the HDFS URI of the filesystem that the shared library should be installed on (for example, hdfs://<HOST>:<PORT>).
Configuring Support for Oozie Uber JARs
An uber JAR is a JAR that contains other JARs with dependencies in a lib/ folder inside the JAR. You can configure the cluster to handle uber JARs properly for the MapReduce action (as long as it does not include any streaming or pipes) by setting the following property in the oozie-site.xml file:
... <property> <name>oozie.action.mapreduce.uber.jar.enable</name> <value>true</value> ...
When this property is set, users can use the oozie.mapreduce.uber.jar configuration property in their MapReduce workflows to notify Oozie that the specified JAR file is an uber JAR.
Configuring Oozie to Run against a Federated Cluster
To run Oozie against a federated HDFS cluster using ViewFS, configure the oozie.service.HadoopAccessorService.supported.filesystems property in oozie-site.xml as follows:
<property> <name>oozie.service.HadoopAccessorService.supported.filesystems</name> <value>hdfs,viewfs</value> </property>
<< Installing Oozie | ©2016 Cloudera, Inc. All rights reserved | Starting, Stopping, and Accessing the Oozie Server >> |
Terms and Conditions Privacy Policy |