Spark and Hadoop Integration
This section describes how to access various Hadoop ecosystem components from Spark.
Accessing HBase from Spark
You can use Spark to process data that is destined for HBase. See Importing Data Into HBase Using Spark.
You can also use Spark in conjunction with Apache Kafka to stream data from Spark to HBase. See Importing Data Into HBase Using Spark and Kafka.
The host from which the Spark application is submitted or on which spark-shell or pyspark runs must have an HBase
gateway role defined in Cloudera Manager and client configurations deployed.
Limitations in Kerberized Environments
The following limitations apply to Spark applications that access HBase in a Kerberized cluster:
- The application must be restarted every seven days.
- If the cluster also has HA enabled, you must specify the keytab and principal parameters in your command line (as opposed
to using kinit). For example:
spark-shell --jars MySparkHbaseApp.jar --principal ME@DOMAIN.COM --keytab /path/to/local/keytab ...
spark-submit --class com.example.SparkHbaseApp --principal ME@DOMAIN.COM --keytab /path/to/local/keytab SparkHBaseApp.jar [ application parameters....]"
For further information, see Spark Authentication.
Accessing Hive from Spark
The host from which the Spark application is submitted or on which spark-shell or pyspark runs must have a Hive gateway role defined in Cloudera Manager and client configurations deployed.
Running Spark Jobs from Oozie
For CDH 5.4 and higher you can invoke Spark jobs from Oozie using the Spark action. For information on the Spark action, see Oozie Spark Action Extension.
In CDH 5.4, to enable dynamic allocation when running the action, specify the following in the
Oozie workflow:
<spark-opts>--conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.minExecutors=1 </spark-opts>If you have enabled the shuffle service in Cloudera Manager, you do not need to specify spark.shuffle.service.enabled.
Page generated July 8, 2016.
<< Tuning Spark Applications | ©2016 Cloudera, Inc. All rights reserved | Building and Running a Crunch Application with Spark >> |
Terms and Conditions Privacy Policy |