List of Pages in Category Spark (48 pages)
Spark
Apache Spark is a general framework for distributed computing that offers high performance for both batch and interactive processing. It exposes APIs for Java, Python, and Scala and consists of Spark core and several related projects:
- Spark SQL - Module for working with structured data. Allows you to seamlessly mix SQL queries with Spark programs.
- Spark Streaming - API that allows you to build scalable fault-tolerant streaming applications.
- MLlib - API that implements common machine learning algorithms.
- GraphX - API for graphs and graph-parallel computation.
Cloudera supports Spark core, Spark SQL (including DataFrames), Spark Streaming, and MLlib. Cloudera does not currently offer commercial support for GraphX or SparkR.
*
A
- Accessing Avro Data Files From Spark SQL Applications
- Accessing Data Stored in Amazon S3
- Accessing External Storage
- Accessing Parquet Files From Spark SQL Applications
- Apache Spark Overview
- Avro Data Files
B
C
D
H
I
J
M
- Managing Spark
- Managing Spark Standalone Using the Command Line
- Managing Spark Using Cloudera Manager
- Managing the Spark History Server
- Monitoring Spark Applications
P
R
- Running Hive on Spark
- Running Spark Applications
- Running Spark Applications on YARN
- Running Spark Applications Using IPython and Jupyter Notebooks
- Running Spark Python Applications
- Running Your First Spark Application
S
- Spark (Standalone) Metrics
- Spark 2 Metrics
- Spark and Hadoop Integration
- Spark Application Overview
- Spark Authentication
- Spark Encryption
- Spark Guide
- Spark Indexing Reference (CDH 5.2 and higher only)
- Spark Installation
- Spark Metrics
- Spark Packages
- Spark Prerequisites
T
U
© 2016 Cloudera, Inc. All rights reserved | ||
Terms and Conditions Privacy Policy |