Installation with the EMC DSSD D5
This topic provides an overview of the installation of Cloudera Manager and CDH for deployments that use the EMC® DSSD™ D5™ Storage appliance as the storage for Hadoop DataNodes. For deployments that do not use the DSSD D5, see Installation Overview.
Documentation for the EMC DSSD D5 is available from EMC.
Continue reading:
- Overview of EMC DSSD D5 Integration
- System Requirements and Limitations for the DSSD D5 Storage Appliance
- Resources
- Installing CDH with DSSD D5 DataNodes
- Configuring Multiple DSSD D5 Appliances in a Cluster
- Deployment Options
- Cloudera Manager Installation Phases
- Cloudera Manager Installation Software
- Upgrading a DSSD CDH Maintenance Release using Parcels
Overview of EMC DSSD D5 Integration
The EMC DSSD D5 provides a high-speed, low-latency storage solution based on flash media. It has been optimized for use as storage for DataNodes in the Cloudera CDH distribution. The DataNode hosts connect directly to the DSSD D5 using a PCIe card interface. In a CDH cluster, only the DataNodes use the DSSD D5 for storage; all other hosts use standard disks.
To manage clusters that use DSSD D5 storage, enable DSSD Mode in Cloudera Manager. All other Hadoop components operate normally. When this mode is enabled, Cloudera Manager can only manage clusters with DSSD D5 DataNodes; you cannot mix cluster types (a cluster that uses only DSSD D5 DataNodes and a cluster that uses regular DataNodes). All DataNodes must connect to the DSSD D5; you cannot mix DataNode types within a cluster.
You can connect multiple instances of a DSSD D5 appliance to a single cluster by defining each DSSD D5 as a "rack." See Configuring Multiple DSSD D5 Appliances in a Cluster.
System Requirements and Limitations for the DSSD D5 Storage Appliance
- Only the RHEL 6.6, RHEL 7.1, and RHEL 7.2 operating systems are supported for DataNode hosts.
- CDH 4 is not supported.
- The HDFS/Sentry sync feature does not work with HDFS on DSSD D5.
For more information about system requirements, see the product compatibility matrix for Product Compatibility Matrix for EMC DSSD D5 and Cloudera Manager 5 Requirements and Supported Versions. For information about hardware requirements, contact EMC DSSD Support.
Resources
- Documentation for the EMC DSSD D5 is available from EMC.
- Cloudera Manager Configuration Properties (See .)
- Cloudera Reference Architecture Documentation
- Tuning:
Installing CDH with DSSD D5 DataNodes
- You cannot install a DSSD D5 cluster using a Cloudera Manager instance that is already managing a cluster.
- You set a single property to enable DSSD Mode.
- You set several DSSD D5-specific properties.
- When installing CDH and other services from Cloudera Manager, only parcel installations are supported. Package installations are not supported. See Managing Software Installation Using Cloudera Manager.
- Installing and racking the DSSD D5 Storage Appliance.
- Installing the DSSD D5 PCI cards in the DataNode hosts.
- Connecting the DataNode hosts to the DSSD D5.
- Installing and configuring the DSSD D5 drivers.
- Installing and configuring the DSSD D5 client software.
- Creating a volume on the DSSD D5 for the DataNodes.
- Identifying CPUs and NUMA nodes. See the EMC document DSSD Hadoop Plugin Installation Guide for more information. You use the information from this task in a later step to configure the Libflood CPU ID parameter during the initial configuration of Cloudera Manager.
- Host names of all the hosts in your cluster.
- The DSSD D5 volume name for the DataNodes.
- If you are not using the entire capacity of the DSSD D5 for this cluster, the DSSD Amount of Usable Capacity as assigned in the DSSD D5. For most deployments, the default value (100 TB) is correct. See the DSSD Hadoop Plugin Installation Guide for more information on setting this property.
- The value for the Libflood CPU ID. See “Identify CPUs and NUMA Nodes” in the DSSD Hadoop Plugin Installation Guide for more information.
Configuring Multiple DSSD D5 Appliances in a Cluster
As of Cloudera Manager 5.8 and higher, you can configure multiple DSSD D5 appliances in a single cluster managed by Cloudera Manager by configuring the hosts connected to each DSSD D5 to belong to a single rack. You can configure this during installation, see Deployment Options, or you can add an additional DSSD D5 to a cluster already configured with one or more DSSD D5 appliances. See Adding an Additional DSSD D5 to a Cluster .
Deployment Options
- Oracle JDK
- Cloudera Manager Server and Agent packages
- Supporting database software
- CDH and managed service software
- Demonstration and proof of concept deployments have two installation options:
- DSSD D5 Installation Path A - Automated Installation by Cloudera Manager Installer (Non-Production) - Cloudera Manager automates the installation of the Oracle JDK, Cloudera Manager Server, embedded PostgreSQL database, Cloudera Manager Agent,
CDH, and managed service software on cluster hosts. Cloudera Manager also configures databases for the Cloudera Manager Server and Hive Metastore and optionally for Cloudera Management Service roles.
This path is recommended for demonstration and proof-of-concept deployments, but is not recommended for production deployments because its not intended to scale and may require
database migration as your cluster grows. To use this method, server and cluster hosts must satisfy the following requirements:
- Provide login to the Cloudera Manager Server host using a root account or an account that has passwordless sudo permission.
- Allow the Cloudera Manager Server host to have uniform SSH access on the same port to all hosts. See Networking and Security Requirements for more information.
- All hosts must have access to standard package repositories and either archive.cloudera.com or a local repository with the required installation files.
- DSSD D5 Installation Path B - Installation Using Cloudera Manager Parcels - You install the Oracle JDK, Cloudera Manager
Server, and embedded PostgreSQL database packages on the Cloudera Manager Server host. Cloudera Manager automates the installation of the Oracle JDK, Cloudera Manager Agents, CDH, and managed service
software on cluster hosts.
For Cloudera Manager to automate installation of Cloudera Manager Agent packages or CDH and managed service software, cluster hosts must satisfy the following requirements:
- Allow the Cloudera Manager Server host to have uniform SSH access on the same port to all hosts. See Networking and Security Requirements for more information.
- All hosts must have access to standard package repositories and either archive.cloudera.com or a local repository with the required installation files.
- DSSD D5 Installation Path A - Automated Installation by Cloudera Manager Installer (Non-Production) - Cloudera Manager automates the installation of the Oracle JDK, Cloudera Manager Server, embedded PostgreSQL database, Cloudera Manager Agent,
CDH, and managed service software on cluster hosts. Cloudera Manager also configures databases for the Cloudera Manager Server and Hive Metastore and optionally for Cloudera Management Service roles.
This path is recommended for demonstration and proof-of-concept deployments, but is not recommended for production deployments because its not intended to scale and may require
database migration as your cluster grows. To use this method, server and cluster hosts must satisfy the following requirements:
- Production deployments require you to first manually install and configure a production database for the Cloudera Manager Server and Hive Metastore. There are two installation options:
- DSSD D5 Installation Path B - Installation Using Cloudera Manager Parcels -You install the Oracle JDK and Cloudera Manager
Server packages on the Cloudera Manager Server host. Cloudera Manager automates the installation of the Oracle JDK, Cloudera Manager Agents, CDH, and managed service software on cluster hosts.
For Cloudera Manager to automate installation of Cloudera Manager Agent packages or CDH and managed service software, cluster hosts must satisfy the following requirements:
- Allow the Cloudera Manager Server host to have uniform SSH access on the same port to all hosts. See Networking and Security Requirements for more information.
- All hosts must have access to standard package repositories and either archive.cloudera.com or a local repository with the required installation files.
- Installation Path C - Manual Installation Using Cloudera Manager Tarballs - You install the Oracle JDK, Cloudera Manager Server, and Cloudera Manager Agent software using tarballs and use Cloudera Manager to automate installation of CDH and managed service software using parcels.
- DSSD D5 Installation Path B - Installation Using Cloudera Manager Parcels -You install the Oracle JDK and Cloudera Manager
Server packages on the Cloudera Manager Server host. Cloudera Manager automates the installation of the Oracle JDK, Cloudera Manager Agents, CDH, and managed service software on cluster hosts.
Cloudera Manager Installation Phases
The following table describes the phases of installing Cloudera Manager and a Cloudera Manager deployment of CDH and managed services. Every phase is required, but you can accomplish each phase in multiple ways, depending on your organization's policies and requirements. The six phases are grouped into three installation paths based on how the Cloudera Manager Server and database software are installed on the Cloudera Manager Server and cluster hosts. The criteria for choosing an installation path are discussed in Cloudera Manager Deployment.
Phase | |||
---|---|---|---|
Phase 1: Install JDK
Install the JDK required by Cloudera Manager Server, Management Service, and CDH. |
There are two options:
|
||
Phase 2: Set up Databases
Install, configure, and start the databases that are required by the Cloudera Manager Server, Cloudera Management Service, and that are optional for some CDH services. |
There are two options:
|
||
Path A | Path B | Path C | |
Phase 3: Install Cloudera Manager Server
Install and start Cloudera Manager Server on one host. |
Use the Cloudera Manager Installer to install its packages and the server. Requires Internet access and sudo privileges on the host. | Use Linux package install commands (like yum) to install Cloudera Manager Server.
Update database properties. Use service commands to start Cloudera Manager Server. |
Use Linux commands to unpack tarballs and service commands to start the server. |
Phase 4: Install Cloudera Manager Agents
Install and start the Cloudera Manager Agent on all hosts. |
Use the Cloudera Manager Installation wizard to install the Agents on all hosts. | There are two options:
|
Use Linux commands to unpack tarballs and service commands to start the agents on all hosts. |
Phase 5: Install CDH and Managed Service software
Install, configure, and start CDH and managed services on all hosts. |
Use the Cloudera Manager Installation wizard to install CDH and other managed services. | There are two options:
|
Use Linux commands to unpack tarballs and service commands to start CDH and managed services on all hosts. |
Phase 6: Create, Configure and Start CDH and Managed Services
Configure and start CDH and managed services. |
Use the Cloudera Manager Installation wizard to install CDH and other managed services, assign roles to hosts, and configure the cluster. Many configurations are automated. | Use the Cloudera Manager Installation wizard to install CDH and other managed services, assign roles to hosts, and configure the cluster. Many configurations are automated. | Use the Cloudera Manager Installation wizard to install CDH and
other managed services, assign roles to hosts, and configure the cluster. Many configurations are automated.
You can also use the Cloudera Manager API to manage a cluster, which can be useful for scripting preconfigured deployments. |
Cloudera Manager Installation Software
- Installation path A (non-production) - A small self-executing Cloudera Manager installation program to
install the Cloudera Manager Server and other packages. The Cloudera Manager installer, which you install on the host where you want the Cloudera Manager Server to run, performs the following:
- Installs the package repositories for Cloudera Manager and the Oracle Java Development Kit (JDK).
- Installs the Cloudera Manager packages.
- Installs and configures an embedded PostgreSQL database for use by the Cloudera Manager Server, some Cloudera Management Service roles, some managed services, and Cloudera Navigator roles.
Important: Path A installation is intended for demonstrations and proof-of-concept deployments only. Do not use this method of installation for production environments. - Installation paths B and C - Cloudera Manager repositories for manually installing the Cloudera Manager Server, Agent, JDK, and embedded database packages.
- Installation path B - The Cloudera Manager Installation wizard for automating installation of Cloudera Manager Agent package.
- All installation paths - The Cloudera Manager Installation wizard for automating CDH and managed
service installation and configuration on the cluster hosts using parcels. Parcels simplify the installation process and allow you to download, distribute, and activate new versions of CDH and
managed services from within Cloudera Manager. After you install Cloudera Manager and you connect to the Cloudera Manager Admin Console for the first time, use the Cloudera Manager Installation
wizard to:
- Discover cluster hosts
- Optionally install the Oracle JDK
- Optionally install CDH, managed service, and Cloudera Manager Agent software on cluster hosts
- Select services
- Map service roles to hosts
- Edit service configurations
- Start services
Upgrading a DSSD CDH Maintenance Release using Parcels
- DSSD 1.2.x
- DSSD_SCR 1.2.x (Upgrade this parcel, even if you have disabled Short Circuit Reads.)