Building and Running a Crunch Application with Spark
Developing and Running a Spark WordCount Application provides a tutorial on writing, compiling, and running a Spark
application. Using the tutorial as a starting point, do the following to build and run a Crunch application with Spark:
- Along with the other dependencies shown in the tutorial, add the appropriate version of the
crunch-core and crunch-spark dependencies to the Maven project.
<dependency> <groupId>org.apache.crunch</groupId> <artifactId>crunch-core</artifactId> <version>${crunch.version}</version> </dependency> <dependency> <groupId>org.apache.crunch</groupId> <artifactId>crunch-spark</artifactId> <version>${crunch.version}</version> </dependency>
- Use SparkPipeline where you would have used MRPipeline in the declaration of your Crunch pipeline. SparkPipeline takes either a String that contains the connection string for the Spark master (local for local mode, yarn for YARN) or a JavaSparkContext instance.
- As you would for a Spark application, use spark-submit start the pipeline with your Crunch application app-jar-with-dependencies.jar file.
spark-submit --class com.example.WordCount crunch-demo-1.0-SNAPSHOT-jar-with-dependencies.jar \ hdfs://namenode_host:8020/user/hdfs/input hdfs://namenode_host:8020/user/hdfs/output
Page generated July 8, 2016.
<< Spark and Hadoop Integration | ©2016 Cloudera, Inc. All rights reserved | Cloudera Glossary >> |
Terms and Conditions Privacy Policy |