This is the documentation for Cloudera Enterprise 5.8.x. Documentation for other versions is available at Cloudera Documentation.

Managing the Spark History Server

The Spark History Server displays information about the history of completed Spark applications. For further information, see Monitoring Spark Applications.

For instructions for configuring the Spark History Server to use Kerberos, see Spark Authentication.

Adding the Spark History Server Using Cloudera Manager

By default, the Spark (Standalone) service does not include a History Server. To configure applications to store history, on Spark clients, set spark.eventLog.enabled to true before starting the application.

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

To add the History Server:

Go to the Spark service.
Click the Instances tab.
Click the Add Role Instances button.
Select a host in the column under History Server, and then click OK.
Click Continue.
Check the checkbox next to the History Server role.
Select Actions for Selected > Start and click Start.
Click Close when the action completes.

Configuring and Running the Spark History Server Using the Command Line

Important:

If you use Cloudera Manager, do not use these command-line instructions.
This information applies specifically to CDH 5.8.x. If you use a lower version of CDH, see the documentation for that version located at Cloudera Documentation.

Create the /user/spark/applicationHistory/ directory in HDFS and set ownership and permissions as follows:

$ sudo -u hdfs hadoop fs -mkdir /user/spark
$ sudo -u hdfs hadoop fs -mkdir /user/spark/applicationHistory
$ sudo -u hdfs hadoop fs -chown -R spark:spark /user/spark
$ sudo -u hdfs hadoop fs -chmod 1777 /user/spark/applicationHistory

On hosts from which you will launch Spark jobs, do the following:

Create /etc/spark/conf/spark-defaults.conf:

cp /etc/spark/conf/spark-defaults.conf.template /etc/spark/conf/spark-defaults.conf

Add the following to /etc/spark/conf/spark-defaults.conf:

spark.eventLog.dir=hdfs://namenode_host:namenode_port/user/spark/applicationHistory
spark.eventLog.enabled=true

spark.eventLog.dir=hdfs://name_service_id/user/spark/applicationHistory
spark.eventLog.enabled=true

On one host, start the History Server:

$ sudo service spark-history-server start

To link the YARN ResourceManager directly to the Spark History Server, set the spark.yarn.historyServer.address property in /etc/spark/conf/spark-defaults.conf:

spark.yarn.historyServer.address=http://spark_history_server:history_port

By default, history_port is 18088. This causes Spark applications to write their history to the directory that the History Server reads.

Page generated July 8, 2016.

Categories: Administrators | Cloudera Manager | Reports | Spark | All Categories