This is the documentation for Cloudera Enterprise 5.8.x. Documentation for other versions is available at Cloudera Documentation.

Spark Authentication

Minimum Required Role: Security Administrator (also provided by Full Administrator)

Spark currently support two methods of authentication. Authentication can be configured using Kerberos or using a shared secret. When using Spark on YARN, Cloudera recommends using Kerberos authentication since it is stronger security measure.

Configuring Kerberos Authentication for Spark Using the Command Line

  Important:
  • If you want to enable Spark event logging on a Kerberos-enabled cluster, you will need to enable Kerberos authentication for Spark as well, since Spark's event logs are written to HDFS.
  • You can use Spark on a Kerberos-enabled cluster only in the YARN mode, not in the Standalone mode.
  • The spark-submit script's --principal and --keytab arguments do not work with Spark-on-YARN's client mode. Use the cluster mode instead.

The following steps describe how to set up Kerberos authentication for Spark using the command line.

    Create the Spark Principal and Keytab File

    1. Create the spark principal and spark.keytab file:
      kadmin: addprinc -randkey spark/fully.qualified.domain.name@YOUR-REALM.COM
      kadmin: xst -k spark.keytab spark/fully.qualified.domain.name
    2. Move the file into the Spark configuration directory and restrict its access exclusively to the spark user:
      $ mv spark.keytab /etc/spark/conf/
      $ chown spark /etc/spark/conf/spark.keytab
      $ chmod 400 /etc/spark/conf/spark.keytab
      For more details on creating Kerberos principals and keytabs, see Step 4: Create and Deploy the Kerberos Principals and Keytab Files.

    Configure the Spark History Server to Use Kerberos

    Using Cloudera Manager

    If you are using Cloudera Manager, use the following steps to edit the spark-env.sh file.
    1. Open the Cloudera Manager Administration Console and navigate to the Spark service.
    2. Click the Configuration tab.
    3. Select Scope > History Server.
    4. Select Category > Advanced.
    5. Edit the History Server Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh property to add the following properties:
      SPARK_HISTORY_OPTS=-Dspark.history.kerberos.enabled=true \
      -Dspark.history.kerberos.principal=spark/fully.qualified.domain.name@YOUR-REALM.COM \
      -Dspark.history.kerberos.keytab=/etc/spark/conf/spark.keytab
    6. Click Save Changes to commit the changes.

    Using the Command Line

    If you are using the command-line, open the Spark configuration file /etc/spark/conf/spark-env.sh file and add the following properties:
    SPARK_HISTORY_OPTS=-Dspark.history.kerberos.enabled=true \
    -Dspark.history.kerberos.principal=spark/fully.qualified.domain.name@YOUR-REALM.COM \
    -Dspark.history.kerberos.keytab=/etc/spark/conf/spark.keytab

    Running Spark Applications on a Secure Cluster

    You can submit compiled Spark applications with the spark-submit script. Specify the following additional command-line options when running the spark-submit script on a secure cluster using the form: --option value.

    Option Description
    --keytab The full path to the file that contains the keytab for the principal. This keytab is copied to the node running the ApplicationMaster using the Secure Distributed Cache, for periodically renewing the login tickets and the delegation tokens. For information on setting up the principal and keytab, see Configuring a Cluster with Custom Kerberos Principalsand Spark Authentication.
    --principal Principal to be used to log in to the KDC, while running on secure HDFS.
    --proxy-user This property allows you to use the spark-submit script to impersonate client users when submitting jobs.

    Configuring Spark Authentication With a Shared Secret Using Cloudera Manager

    Minimum Required Role: Security Administrator (also provided by Full Administrator)

    Authentication using a shared secret can be configured using the spark.authenticate configuration property. The authentication process checks to make sure Spark has the same shared secret as the applications. If the shared secret does not match, authentication will fail.

    If you are using Spark on YARN, set the spark.authenticate parameter to true to generate a secret. This secret will automatically be distributed to all applications communicating with Spark. For Cloudera Manager deployments, use the following instructions:
    1. Go to the Spark Service > Configuration tab.
    2. In the Search field, type spark authenticate to find the Spark Authentication settings.
    3. Check the checkbox for the Spark Authentication property.
    4. Click Save Changes.
    Page generated July 8, 2016.