This is the documentation for Cloudera Enterprise 5.8.x. Documentation for other versions is available at Cloudera Documentation.

List of Pages in Category Spark (48 pages)

Spark

Apache Spark is a general framework for distributed computing that offers high performance for both batch and interactive processing. It exposes APIs for Java, Python, and Scala and consists of Spark core and several related projects:

  • Spark SQL - Module for working with structured data. Allows you to seamlessly mix SQL queries with Spark programs.
  • Spark Streaming - API that allows you to build scalable fault-tolerant streaming applications.
  • MLlib - API that implements common machine learning algorithms.
  • GraphX - API for graphs and graph-parallel computation.

Cloudera supports Spark core, Spark SQL (including DataFrames), Spark Streaming, and MLlib. Cloudera does not currently offer commercial support for GraphX or SparkR.


*

A

B

C

D

H

I

J

M

P

R

S

T

U