This is the documentation for Cloudera Enterprise 5.8.x. Documentation for other versions is available at Cloudera Documentation.

Configuring Spark Applications

You can specify Spark application configuration properties as follows:
  • Pass properties using the --conf command-line switch; for example:
    spark-submit \
    --class com.cloudera.example.YarnExample \
    --master yarn \
    --deploy-mode cluster \
    --conf "spark.eventLog.dir=hdfs:///user/spark/eventlog" \
    lib/yarn-example.jar \
    10
  • Specify properties in spark-defaults.conf. See Configuring Spark Application Properties in spark-defaults.conf.
  • Pass properties directly to the SparkConf used to create the SparkContext in your Spark application; for example:
    • Scala
      val conf = new SparkConf().set("spark.dynamicAllocation.initialExecutors", "5")
      val sc = new SparkContext(conf)
    • Python
      from pyspark import SparkConf, SparkContext
      from pyspark.sql import SQLContext
      conf = (SparkConf().setAppName('Application name'))
      conf.set('spark.hadoop.avro.mapred.ignore.inputs.without.extension', 'false')
      sc = SparkContext(conf = conf)
      sqlContext = SQLContext(sc)
The order of precedence in configuration properties is:
  1. Properties passed to SparkConf.
  2. Arguments passed to spark-submit, spark-shell, or pyspark.
  3. Properties set in spark-defaults.conf.

For more information, see Spark Configuration.

Configuring Spark Application Properties in spark-defaults.conf

Specify properties in the spark-defaults.conf file in the form property value.

You create a comment by adding a hash mark ( # ) at the beginning of a line. You cannot add comments to the end or middle of a line.

This example shows a spark-defaults.conf file:
spark.master     spark://mysparkmaster.acme.com:7077
spark.eventLog.enabled    true
spark.eventLog.dir        hdfs:///user/spark/eventlog
# Set spark executor memory
spark.executor.memory     2g
spark.logConf             true
Cloudera recommends placing configuration properties that you want to use for every application in spark-defaults.conf. See Application Properties for more information.

Configuring Properties in spark-defaults.conf Using Cloudera Manager

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

Configure properties for all Spark applications in spark-defaults.conf as follows:

  1. Go to the Spark service.
  2. Click the Configuration tab.
  3. Select Scope > Gateway.
  4. Select Category > Advanced.
  5. Locate the Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf property.
  6. Specify properties described in Application Properties.

    If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.

  7. Click Save Changes to commit the changes.
  8. Deploy the client configuration.

Configuring Properties in spark-defaults.conf Using the Command Line

  Important:
  • If you use Cloudera Manager, do not use these command-line instructions.
  • This information applies specifically to CDH 5.8.x. If you use a lower version of CDH, see the documentation for that version located at Cloudera Documentation.

To configure properties for all Spark applications using the command line, edit the file SPARK_HOME/conf/spark-defaults.conf.

Configuring Spark Application Logging Properties

You configure Spark application logging properties in a log4j.properties file.

Configuring Logging Properties Using Cloudera Manager

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

To configure only the logging threshold level, follow the procedure in Configuring Logging Thresholds. To configure any other logging property, do the following:

  1. Go to the Spark service.
  2. Click the Configuration tab.
  3. Select Scope > Gateway.
  4. Select Category > Advanced.
  5. Locate the Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/log4j.properties property.
  6. Specify log4j properties.

    If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.

  7. Click Save Changes to commit the changes.
  8. Deploy the client configuration.

Configuring Logging Properties Using the Command Line

  Important:
  • If you use Cloudera Manager, do not use these command-line instructions.
  • This information applies specifically to CDH 5.8.x. If you use a lower version of CDH, see the documentation for that version located at Cloudera Documentation.

To specify logging properties for all users on a machine by using the command line, edit the file SPARK_HOME/conf/log4j.properties. To set it just for yourself or for a specific application, copy SPARK_HOME/conf/log4j.properties.template to log4j.properties in your working directory or any directory in your application's classpath.

Page generated July 8, 2016.