List of Pages in Category MapReduce (46 pages)
MapReduce
A distributed processing framework for processing and generating large data sets and an implementation that runs on large clusters of industry-standard machines.
The processing model defines two types of functions: a map function that processes a key-value pair to generate a set of intermediate key-value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key.
A MapReduce job partitions the input data set into independent chunks that are processed by the map functions in a parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce functions. Typically both the input and the output of the job are stored in a distributed filesystem.
The implementation provides an API for configuring and submitting jobs and job scheduling and management services; a library of search, sort, index, inverted index, and word co-occurrence algorithms; and the runtime. The runtime system partitions the input data, schedules the program's execution across a set of machines, handles machine failures, and manages the required inter-machine communication.
*
A
C
- CDH 5 and MapReduce
- Cloudera Manager and CDH QuickStart Guide
- Cloudera Search Tasks and Processes
- Configuring MRv1 Security
- Configuring Oozie
- Configuring Sqoop 2
- Configuring TLS/SSL for HDFS, YARN and MapReduce
D
H
I
- Installing CDH 5 with MRv1 on a Single Linux Host in Pseudo-distributed mode
- Installing CDH 5 with YARN on a Single Linux Host in Pseudo-distributed mode
- Installing MapReduce Tools for use with Cloudera Search
J
M
- Managing a Cluster with Whirr
- Managing MapReduce
- Managing YARN
- Managing YARN (MRv2) and MapReduce (MRv1)
- MapReduce (MRv1) and YARN (MRv2) High Availability
- MapReduce (MRv1) JobTracker High Availability
- MapReduce 2.0 (YARN)
- MapReduce Batch Indexing Reference
- MapReduce Metrics
- MapReduceIndexerTool
- Migrating from MapReduce (MRv1) to MapReduce (MRv2)
- Monitoring MapReduce Jobs
- MRv1 ONLY: Task-controller Error Codes
O
P
S
- Setting HADOOP_MAPRED_HOME
- Setting HADOOP_MAPRED_HOME
- Snappy Compression
- Spark Application Overview
- Spark Indexing Reference (CDH 5.2 and higher only)
- Step 16: Configure Either MRv1 Security or YARN Security
- Step 2: Verify User Accounts and Groups in CDH 5 Due to Security
T
U
V
Y
© 2016 Cloudera, Inc. All rights reserved | ||
Terms and Conditions Privacy Policy |