This is the documentation for Cloudera Enterprise 5.8.x. Documentation for other versions is available at Cloudera Documentation.

DSSD D5 Installation Path C - Manual Installation Using Cloudera Manager Tarballs

This topic describes how to install Cloudera Manager and CDH on a cluster that uses the EMC® DSSD™ D5™ storage appliance as the storage for Hadoop DataNodes. To install clusters that do not use the DSSD D5, see Installation Overview.

In this procedure, you install the Oracle JDK, Cloudera Manager Server, and Cloudera Manager Agent software using tarballs and then you use Cloudera Manager to automate installation of CDH and managed service software using parcels. For a full discussion of deployment options, see Installation Overview.

The general steps in the procedure for Installation Path C follow.

  1. DSSD D5 Pre-Installation Tasks
  2. Before You Begin
    1. Install the Oracle JDK
    2. Perform Configuration Required by Single User Mode
    3. Install and Configure External Databases
  3. Install the Cloudera Manager Server and Agents
    1. Perform Configuration Required by Single User Mode
    2. Create Users
    3. Create the Cloudera Manager Server Local Data Storage Directory
    4. Configure Cloudera Manager Agents
    5. Configuring for a Custom Cloudera Manager User and Custom Directories
  4. Create Parcel Directories
  5. Start the Cloudera Manager Server
  6. Start the Cloudera Manager Agents
  7. Install Dependencies
  8. Start and Log into the Cloudera Manager Admin Console
    1. Enable DSSD Mode and Configure Cloudera Manager for the DSSD D5
    2. Choose Cloudera Manager Edition
    3. Choose Cloudera Manager Hosts
    4. Install CDH and Managed Service Software
    5. Add Services
    6. Configure Database Settings
    7. Review and Finish the DSSD D5 Configuration
    8. (Optional) Disable Short Circuit Reads for HBase and Impala
      1. (Optional) Change the Cloudera Manager User
      2. Change the Default Administrator Password
      3. Configure Oozie Data Purge Settings
      4. (Optional) Install Multiple DSSD D5 Appliances in a Cluster
      5. Test the Installation

      DSSD D5 Pre-Installation Tasks

      Complete the following tasks on the DSSD D5 appliance and hosts before installing Cloudera software:
      • Installing and racking the DSSD D5 Storage Appliance.
      • Installing the DSSD D5 PCI cards in the DataNode hosts.
      • Connecting the DataNode hosts to the DSSD D5.
      • Installing and configuring the DSSD D5 drivers.
      • Installing and configuring the DSSD D5 client software.
      • Creating a volume on the DSSD D5 for the DataNodes.
      • Identifying CPUs and NUMA nodes. See the EMC document DSSD Hadoop Plugin Installation Guide for more information. You use the information from this task in a later step to configure the Libflood CPU ID parameter during the initial configuration of Cloudera Manager.

      See the EMC DSSD D5 document DSSD D5 Installation and Service Guide for more information about these tasks.

      After completing the above tasks, install Cloudera Manager. You need the following information before proceeding:
      • Host names of all the hosts in your cluster.
      • The DSSD D5 volume name for the DataNodes.
      • If you are not using the entire capacity of the DSSD D5 for this cluster, the DSSD Amount of Usable Capacity as assigned in the DSSD D5. For most deployments, the default value (100 TB) is correct. See the DSSD Hadoop Plugin Installation Guide for more information on setting this property.
      • The value for the Libflood CPU ID. See “Identify CPUs and NUMA Nodes” in the DSSD Hadoop Plugin Installation Guide for more information.

      Before You Begin

      Install the Oracle JDK

      See Java Development Kit Installation.

      Perform Configuration Required by Single User Mode

      If you are creating a Cloudera Manager deployment that employs single user mode, perform the configuration steps described in Single User Mode Requirements.

      Install and Configure External Databases

      Read Cloudera Manager and Managed Service Datastores. Install and configure an external database for services or Cloudera Management Service roles using the instructions in External Databases for Oozie Server, Sqoop Server, Activity Monitor, Reports Manager, Hive Metastore Server, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server.

      Cloudera Manager also requires a database. Prepare the Cloudera Manager Server database as described in Preparing a Cloudera Manager Server External Database.

      Install the Cloudera Manager Server and Agents

      Tarballs contain both the Cloudera Manager Server and Cloudera Manager Agent in a single file. Download tarballs from the locations listed in Cloudera Manager Version and Download Information. Copy the tarballs and unpack them on all hosts on which you intend to install Cloudera Manager Server and Cloudera Manager Agents, in a directory of your choosing. If necessary, create a new directory to accommodate the files you extract from the tarball. For instance, if /opt/cloudera-manager does not exist, create it using a command similar to:
      $ sudo mkdir /opt/cloudera-manager
      Extract the contents of the tarball, to this directory. For example, to copy a tar file to your home directory and extract the contents of all tar files to the /opt/ directory, use a command similar to the following:
      $ sudo tar xzf cloudera-manager*.tar.gz -C /opt/cloudera-manager

      The files are extracted to a subdirectory named according to the Cloudera Manager version being extracted. For example, files could be extracted to /opt/cloudera-manager/cm-5.0/. This full path is needed later and is referred to as tarball_root directory.

      Perform Configuration Required by Single User Mode

      If you are creating a Cloudera Manager deployment that employs single user mode, perform the configuration steps described in Single User Mode Requirements.

      Create Users

      The Cloudera Manager Server and managed services require a user account to complete tasks. When installing Cloudera Manager from tarballs, you must create this user account on all hosts manually. Because Cloudera Manager Server and managed services are configured to use the user account cloudera-scm by default, creating a user with this name is the simplest approach. This created user, is used automatically after installation is complete.

      To create user cloudera-scm, use a command such as the following:
      $ sudo useradd --system --home=/opt/cloudera-manager/cm-5.6.0/run/cloudera-scm-server --no-create-home --shell=/bin/false --comment "Cloudera SCM User" cloudera-scm
      Ensure the --home argument path matches your environment. This argument varies according to where you place the tarball, and the version number varies among releases. For example, the --home location could be /opt/cm-5.6.0/run/cloudera-scm-server.

      Create the Cloudera Manager Server Local Data Storage Directory

      1. Create the following directory: /var/lib/cloudera-scm-server.
      2. Change the owner of the directory so that the cloudera-scm user and group have ownership of the directory. For example:
        $ sudo mkdir /var/log/cloudera-scm-server
        $ sudo chown cloudera-scm:cloudera-scm /var/log/cloudera-scm-server

      Configure Cloudera Manager Agents

      • On every Cloudera Manager Agent host, configure the Cloudera Manager Agent to point to the Cloudera Manager Server by setting the following properties in the tarball_root/etc/cloudera-scm-agent/config.ini configuration file:
        Property Description
        server_host Name of the host where Cloudera Manager Server is running.
        server_port Port on the host where Cloudera Manager Server is running.
      • By default, a tarball installation has a var subdirectory where state is stored. In a non-tarball installation, state is stored in /var. Cloudera recommends that you reconfigure the tarball installation to use an external directory as the /var equivalent (/var or any other directory outside the tarball) so that when you upgrade Cloudera Manager, the new tarball installation can access this state. Configure the installation to use an external directory for storing state by editing tarball_root/etc/default/cloudera-scm-agent and setting the CMF_VAR variable to the location of the /var equivalent. If you do not reuse the state directory between different tarball installations, duplicate Cloudera Manager Agent entries can occur in the Cloudera Manager database.

      Configuring for a Custom Cloudera Manager User and Custom Directories

      You can change the default username and directories used by Cloudera Manager. If you do not change the default, skip to Cloudera Manager and Managed Service Datastores. By default, Cloudera Manager creates the following directories in /var/log and /var/lib:
      • /var/log/cloudera-scm-headlamp
      • /var/log/cloudera-scm-firehose
      • /var/log/cloudera-scm-alertpublisher
      • /var/log/cloudera-scm-eventserver
      • /var/lib/cloudera-scm-headlamp
      • /var/lib/cloudera-scm-firehose
      • /var/lib/cloudera-scm-alertpublisher
      • /var/lib/cloudera-scm-eventserver
      • /var/lib/cloudera-scm-server
      If you are using a custom username and custom directories for Cloudera Manager, you must create these directories on the Cloudera Manager Server host and assign ownership of these directories to the custom username. Cloudera Manager installer makes no changes to any directories that already exist. Cloudera Manager cannot write to any existing directories for which it does not have proper permissions, and if you do not change ownership, Cloudera Management Service roles may not perform as expected. To resolve these issues, do one of the following:
      • Change ownership of existing directories:
        1. Use the chown command to change ownership of all existing directories to the Cloudera Manager user. If the Cloudera Manager username and group are cloudera-scm, to change the ownership of the headlamp log directory, you issue a command similar to the following:
          $ sudo chown -R cloudera-scm:cloudera-scm /var/log/cloudera-scm-headlamp
      • Use alternate directories:
        1. If the directories you plan to use do not exist, create them. For example, to create /var/cm_logs/cloudera-scm-headlamp for use by the cloudera-scm user, you can use the following commands:
          mkdir /var/cm_logs/cloudera-scm-headlamp
          chown cloudera-scm /var/cm_logs/cloudera-scm-headlamp
        2. Connect to the Cloudera Manager Admin Console.
        3. Select Clusters > Cloudera Management Service
        4. Select Scope > role name.
        5. Click the Configuration tab.
        6. Enter a term in the Search field to find the settings to be changed. For example, you might enter /var or directory.
        7. Update each value with the new locations for Cloudera Manager to use.
            Note: The configuration property for the Cloudera Manager Server Local Data Storage Directory (default value is: /var/lib/cloudera-scm-server) is located on a different page:
          1. Select Administration > Settings.
          2. Type directory in the Search box.
          3. Enter the directory path in the Cloudera Manager Server Local Data Storage Directory property.
        8. Click Save Changes to commit the changes.

      Create Parcel Directories

      1. On the Cloudera Manager Server host, create a parcel repository directory:
        $ sudo mkdir -p /opt/cloudera/parcel-repo
      2. Change the directory ownership to be the username you are using to run Cloudera Manager:
        $ sudo chown username:groupname /opt/cloudera/parcel-repo
        where username and groupname are the user and group names (respectively) you are using to run Cloudera Manager. For example, if you use the default username cloudera-scm, you would run the command:
        $ sudo chown cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo
      3. On each cluster host, create a parcels directory:
        $ sudo mkdir -p /opt/cloudera/parcels
      4. Change the directory ownership to be the username you are using to run Cloudera Manager:
        $ sudo chown username:groupname /opt/cloudera/parcels
        where username and groupname are the user and group names (respectively) you are using to run Cloudera Manager. For example, if you use the default username cloudera-scm, you would run the command:
        $ sudo chown cloudera-scm:cloudera-scm /opt/cloudera/parcels

      Start the Cloudera Manager Server

        Important: When you start the Cloudera Manager Server and Agents, Cloudera Manager assumes you are not already running HDFS and MapReduce. If these services are running:
      1. Shut down HDFS and MapReduce. See Stopping Services (CDH 4) or Stopping CDH Services Using the Command Line (CDH 5) for the commands to stop these services.
      2. Configure the init scripts to not start on boot. Use commands similar to those shown in Configuring init to Start Core Hadoop System Services (CDH 4) or Configuring init to Start Hadoop System Services (CDH 5), but disable the start on boot (for example, $ sudo chkconfig hadoop-hdfs-namenode off).
      Contact Cloudera Support for help converting your existing Hadoop configurations for use with Cloudera Manager.
      The way in which you start the Cloudera Manager Server varies according to what account you want the Server to run under:
      • As root:
        $ sudo tarball_root/etc/init.d/cloudera-scm-server start 
      • As another user. If you run as another user, ensure the user you created for Cloudera Manager owns the location to which you extracted the tarball including the newly created database files. If you followed the earlier examples and created the directory /opt/cloudera-manager and the user cloudera-scm, you could use the following command to change ownership of the directory:
        $ sudo chown -R cloudera-scm:cloudera-scm /opt/cloudera-manager

        Once you have established ownership of directory locations, you can start Cloudera Manager Server using the user account you chose. For example, you might run the Cloudera Manager Server as cloudera-service. In this case, you have the following options:

        • Run the following command:
          $ sudo -u cloudera-service tarball_root/etc/init.d/cloudera-scm-server start 
        • Edit the configuration files so the script internally changes the user. Then run the script as root:
          1. Remove the following line from tarball_root/etc/default/cloudera-scm-server:
            export CMF_SUDO_CMD=" "
          2. Change the user and group in tarball_root/etc/init.d/cloudera-scm-server to the user you want the server to run as. For example, to run as cloudera-service, change the user and group as follows:
            USER=cloudera-service
            GROUP=cloudera-service
          3. Run the server script as root:
            $ sudo tarball_root/etc/init.d/cloudera-scm-server start 
      • To start the Cloudera Manager Server automatically after a reboot:
        1. Run the following commands on the Cloudera Manager Server host:
          • RHEL-compatible and SLES (only RHEL is supported for DSSD D5 DataNodes)
            $ cp tarball_root/etc/init.d/cloudera-scm-server /etc/init.d/cloudera-scm-server
            $ chkconfig cloudera-scm-server on
          • Debian/Ubuntu (not supported for DSSD D5 DataNodes)
            $ cp tarball_root/etc/init.d/cloudera-scm-server /etc/init.d/cloudera-scm-server
            $ update-rc.d cloudera-scm-server defaults
        2. On the Cloudera Manager Server host, open the /etc/init.d/cloudera-scm-server file and change the value of CMF_DEFAULTS from ${CMF_DEFAULTS:-/etc/default} to tarball_root/etc/default.

      If the Cloudera Manager Server does not start, see Troubleshooting Installation and Upgrade Problems.

      Start the Cloudera Manager Agents

      Start the Cloudera Manager Agent according to the account you want the Agent to run under:
      • To start the Cloudera Manager Agent, run this command on each Agent host:
        $ sudo tarball_root/etc/init.d/cloudera-scm-agent start
        When the Agent starts, it contacts the Cloudera Manager Server.
      • If you are running single user mode, start Cloudera Manager Agent using the user account you chose. For example, to run the Cloudera Manager Agent as cloudera-scm, you have the following options:
        • Run the following command:
          $ sudo -u cloudera-scm tarball_root/etc/init.d/cloudera-scm-agent start 
        • Edit the configuration files so the script internally changes the user, and then run the script as root:
          1. Remove the following line from tarball_root/etc/default/cloudera-scm-agent:
            export CMF_SUDO_CMD=" "
          2. Change the user and group in tarball_root/etc/init.d/cloudera-scm-agent to the user you want the Agent to run as. For example, to run as cloudera-scm, change the user and group as follows:
            USER=cloudera-scm
            GROUP=cloudera-scm
          3. Run the Agent script as root:
            $ sudo tarball_root/etc/init.d/cloudera-scm-agent start 
      • To start the Cloudera Manager Agents automatically after a reboot:
        1. Run the following commands on each Agent host:
          • RHEL-compatible and SLES (only RHEL is supported for DSSD D5 DataNodes)
            $ cp tarball_root/etc/init.d/cloudera-scm-agent /etc/init.d/cloudera-scm-agent
            $ chkconfig cloudera-scm-agent on
          • Debian/Ubuntu (not supported for DSSD D5 DataNodes)
            $ cp tarball_root/etc/init.d/cloudera-scm-agent /etc/init.d/cloudera-scm-agent
            $ update-rc.d cloudera-scm-agent defaults
        2. On each Agent, open the tarball_root/etc/init.d/cloudera-scm-agent file and change the value of CMF_DEFAULTS from ${CMF_DEFAULTS:-/etc/default} to tarball_root/etc/default.

      Install Dependencies

      When you install with tarballs and parcels, some services may require additional dependencies that are not provided by Cloudera. On each host, install the required packages:
      • chkconfig
      • python (2.6 required for CDH 5)
      • bind-utils
      • psmisc
      • libxslt
      • zlib
      • sqlite
      • cyrus-sasl-plain
      • cyrus-sasl-gssapi
      • fuse
      • portmap
      • fuse-libs
      • redhat-lsb

      Start and Log into the Cloudera Manager Admin Console

      The Cloudera Manager Server URL takes the following form http://Server host:port, where Server host is the fully qualified domain name or IP address of the host where the Cloudera Manager Server is installed, and port is the port configured for the Cloudera Manager Server. The default port is 7180.
      1. Wait several minutes for the Cloudera Manager Server to start. To observe the startup process, run tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log on the Cloudera Manager Server host. If the Cloudera Manager Server does not start, see Troubleshooting Installation and Upgrade Problems.
      2. In a web browser, enter http://Server host:7180, where Server host is the fully qualified domain name or IP address of the host where the Cloudera Manager Server is running.

        The login screen for Cloudera Manager Admin Console displays.

      3. Log into Cloudera Manager Admin Console. The default credentials are: Username: admin Password: admin. Cloudera Manager does not support changing the admin username for the installed account. You can change the password using Cloudera Manager after you run the installation wizard. Although you cannot change the admin username, you can add a new user, assign administrative privileges to the new user, and then delete the default admin account.
      4. After logging in, the Cloudera Manager End User License Terms and Conditions page displays. Read the terms and conditions and then select Yes to accept them.
      5. Click Continue.

        The Welcome to Cloudera Manager page displays.

      Enable DSSD Mode and Configure Cloudera Manager for the DSSD D5

      1. Click the Cloudera Manager logo to open the Home page.
      2. Click Administration > Settings.
      3. Type DSSD in the Search box.
      4. Select the DSSD Mode property.
      5. Click Save Changes to commit the changes.

        Cloudera Manager reconfigures the system for DSSD mode, which may take several minutes.

      6. Click the Cloudera Manager logo to open the Home page.
      7. Click Add Cluster to continue with the installation.
      8. The Cloudera Manager End User License Terms and Conditions page displays. Read the terms and conditions and then select Yes to accept them.
      9. Click Continue.
      10. The EMC Software License Agreement page displays. Read the terms and conditions and then select Yes to accept them.
      11. Click Continue.

        The Welcome to Cloudera Manager page displays.

      Choose Cloudera Manager Edition

      From the Welcome to Cloudera Manager page, you can select the edition of Cloudera Manager to install and, optionally, install a license:

      1. Choose which edition to install:
        • Cloudera Express, which does not require a license, but provides a limited set of features.
        • Cloudera Enterprise Data Hub Edition Trial, which does not require a license, but expires after 60 days and cannot be renewed.
        • Cloudera Enterprise with one of the following license types:
          • Basic Edition
          • Flex Edition
          • Data Hub Edition
        If you choose Cloudera Express or Cloudera Enterprise Data Hub Edition Trial, you can upgrade the license at a later time. See Managing Licenses.
      2. If you elect Cloudera Enterprise, install a license:
        1. Click Upload License.
        2. Click the document icon to the left of the Select a License File text field.
        3. Go to the location of your license file, click the file, and click Open.
        4. Click Upload.
      3. Information is displayed indicating what the CDH installation includes. At this point, you can click the Support drop-down menu to access online Help or the Support Portal.
      4. Click Continue to proceed with the installation.

      Choose Cloudera Manager Hosts

      1. Click the Currently Managed Hosts tab.
      2. Choose the hosts to add to the cluster.
      3. Click Continue.

        The Cluster Installation Select Repository screen displays.

      Install CDH and Managed Service Software

      1. Choose the CDH and managed service version:
        1. Choose the parcels to install. The choices depend on the repositories you have chosen; a repository can contain multiple parcels. Only the parcels for the latest supported service versions are configured by default. Select the following parcels:
          • CDH 5
          • DSSD version 1.2
          • DSSD_SCR version 1.2 - This parcel enables short-circuit reads for HBase and Impala. Select this parcel even if you intend to disable short-circuit reads. (See DSSD D5 and Short-Circuit Reads.)
          • Any additional parcels required for your deployment (for example: Accumulo, Spark, or Keytrustee) .
          You can add additional parcels for previous versions by specifying custom repositories. For example, you can find the locations of the previous CDH 5 parcels at https://archive.cloudera.com/cdh5/parcels/.
          1. To specify the parcel directory, specify the local parcel repository, add a parcel repository, or specify the properties of a proxy server through which parcels are downloaded, click the More Options button and do one or more of the following:
            • Parcel Directory and Local Parcel Repository Path - Specify the location of parcels on cluster hosts and the Cloudera Manager Server host. If you change the default value for Parcel Directory and have already installed and started Cloudera Manager Agents, restart the Agents:
              $ sudo service cloudera-scm-agent restart
            • Parcel Repository - In the Remote Parcel Repository URLs field, click the button and enter the URL of the repository. The URL you specify is added to the list of repositories listed in the Configuring Cloudera Manager Server Parcel Settings page and a parcel is added to the list of parcels on the Select Repository page. If you have multiple repositories configured, you see all the unique parcels contained in all your repositories.
            • Proxy Server - Specify the properties of a proxy server.
          2. Click OK.
        2. If you are using Cloudera Manager to install software, select the release of Cloudera Manager Agent. You can choose either the version that matches the Cloudera Manager Server you are currently using or specify a version in a custom repository. If you opted to use custom repositories for installation files, you can provide a GPG key URL that applies for all repositories.
      2. Click Continue. Cloudera Manager installs the CDH and managed service parcels. During parcel installation, progress is indicated for the phases of the parcel installation process in separate progress bars. If you are installing multiple parcels, you see progress bars for each parcel. When the Continue button at the bottom of the screen turns blue, the installation process is completed. Click Continue.
      3. Click Continue.

        The Host Inspector runs to validate the installation and provides a summary of what it finds, including all the versions of the installed components. If the validation is successful, click Finish.

      Add Services

      1. In the first page of the Add Services wizard, choose the combination of services to install and whether to install Cloudera Navigator:
        • Select the combination of services to install:
          • Core Hadoop - HDFS, YARN (includes MapReduce 2), ZooKeeper, Oozie, Hive, and Hue
          • Core with HBase
          • Core with Impala
          • Core with Search
          • Core with Spark
          • All Services - HDFS, YARN (includes MapReduce 2), ZooKeeper, Oozie, Hive, Hue, HBase, Impala, Solr, Spark, and Key-Value Store Indexer
          • Custom Services - Any combination of services.
          As you select services, keep the following in mind:
          • Some services depend on other services; for example, HBase requires HDFS and ZooKeeper. Cloudera Manager tracks dependencies and installs the correct combination of services.
          • In a Cloudera Manager deployment of a CDH 4 cluster, the MapReduce service is the default MapReduce computation framework. Choose Custom Services to install YARN, or use the Add Service functionality to add YARN after installation completes.
              Note: You can create a YARN service in a CDH 4 cluster, but it is not considered production ready.
          • In a Cloudera Manager deployment of a CDH 5 cluster, the YARN service is the default MapReduce computation framework. Choose Custom Services to install MapReduce, or use the Add Service functionality to add MapReduce after installation completes.
              Note: In CDH 5, the MapReduce service has been deprecated. However, the MapReduce service is fully supported for backward compatibility through the CDH 5 lifecycle.
          • The Flume service can be added only after your cluster has been set up.
        • If you have chosen Data Hub Edition Trial or Cloudera Enterprise, optionally select the Include Cloudera Navigator checkbox to enable Cloudera Navigator. See Cloudera Navigator 2 Overview.
      2. Click Continue.
      3. Customize the assignment of role instances to hosts. The wizard evaluates the hardware configurations of the hosts to determine the best hosts for each role. The DataNode role is only assigned to hosts that are connected to the DSSD D5. The wizard assigns all worker roles to the same set of hosts to which the HDFS DataNode role is assigned. You can reassign role instances if necessary.

        Click a field below a role to display a dialog box containing a list of hosts. If you click a field containing multiple hosts, you can also select All Hosts to assign the role to all hosts, or Custom to display the pageable hosts dialog box.

        The following shortcuts for specifying hostname patterns are supported:
        • Range of hostnames (without the domain portion)
          Range Definition Matching Hosts
          10.1.1.[1-4] 10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4
          host[1-3].company.com host1.company.com, host2.company.com, host3.company.com
          host[07-10].company.com host07.company.com, host08.company.com, host09.company.com, host10.company.com
        • IP addresses
        • Rack name

        Click the View By Host button for an overview of the role assignment by hostname ranges.

      4. When you are satisfied with the assignments, click Continue.

      Configure Database Settings

      On the Database Setup page, configure settings for required databases:
      1. Enter the database host, database type, database name, username, and password for the database that you created when you set up the database.
      2. Click Test Connection to confirm that Cloudera Manager can communicate with the database using the information you have supplied. If the test succeeds in all cases, click Continue; otherwise, check and correct the information you have provided for the database and then try the test again. (For some servers, if you are using the embedded database, you will see a message saying the database will be created at a later step in the installation process.)

        The Review Changes screen displays.

      Review and Finish the DSSD D5 Configuration

      From the Cluster Setup Review Changes page:

      1. Review the configuration changes to be applied. Confirm the settings entered for file system paths. The file paths required vary based on the services to be installed. If you chose to add the Sqoop service, indicate whether to use the default Derby database or the embedded PostgreSQL database. If the latter, type the database name, host, and user credentials that you specified when you created the database.

        The configuration properties that display on this page are somewhat different from those that display when configuring non-DSSD D5 DataNodes. Some properties, such as the DataNode directory have been removed because they do not apply to a cluster that uses DSSD D5 DataNodes. Other properties, such as the Flood Volume Name are specific to the DSSD D5 DataNode role.

      2. (Required) In the Flood Volume Name field, enter the name of the Flood Volume as configured in the DSSD D5 appliance. If you are deploying multiple DSSD D5 appliances, note that you must specify this property for each appliance using a Role Group.
      3. (Optional) If you are not using the entire capacity of the DSSD D5 for this cluster, set the Usable Capacity property. For most deployments, the default value (100 TB) is correct. See the EMC document DSSD Hadoop Plugin Installation Guide for more information on setting this property.
      4. (Optional) Set the value of the HDFS Block Size parameter. The default value for this parameter is 512 MB when in DSSD Mode. You may want to change this for some types of work loads. See Tuning the HDFS Block Size for DSSD Mode .
      5. Click Continue.

        The wizard starts the services.

      6. When all of the services are started, click Continue.

        You see a success message indicating that your cluster has been successfully started.

      7. Click Finish to proceed to the Cloudera Manager Admin Console Home Page.
      8. If you see a message indicating that you need to restart Cloudera Management Services, restart the Cloudera Management Service:
        1. Do one of the following:
            1. Select Clusters > Cloudera Management Service > Cloudera Management Service.
            2. Select Actions > Restart.
          • On the Home > Status tab, click to the right of Cloudera Management Service and select Restart.
        2. Click Restart to confirm. The Command Details window shows the progress of stopping and then starting the roles.
        3. When Command completed with n/n successful subcommands appears, the task is complete. Click Close.
      9. Choose Cluster > HDFS > Configuration and then in the filter section, select Scope > DSSD DataNode to view the DSSD D5 DataNode-specific properties.

        See the HDFS Properties in CDH 5.6.0 configuration reference for descriptions of these properties.

        See the EMC document DSSD Hadoop Plugin Installation Guide for information about setting these properties.

      10. (Recommended for best performance) Set the Libflood CPU ID property.

        The value to use for this parameter should have been determined during the set up of the DSSD D5 appliance. See “Identify CPUs and NUMA Nodes” in the EMC document DSSD Hadoop Plugin Installation Guide. The value you set for this parameter can effect the performance of your cluster.

      11. (Optional) Set the following properties to tune the performance of your cluster:
        • Libflood Command Queues
        • Libflood Command Queue Depth
      12. (Optional) Set the Java heap size for the NameNode.
        1. Choose Clusters > HDFS > Configuration.
        2. Type Java heap in the search box.
        3. Set the Java Heap Size of NameNode in Bytes parameter:

          Cloudera Manager automatically sets the value of this parameter to 4 GB (If there are not adequate resources in the cluster, Cloudera Manager may set a smaller value.) Cloudera recommends that you manually set the value of this parameter by calculating the number of HDFS blocks in the cluster and including 1 GB of Java heap for each 1 million HDFS blocks. For more information on HDFS block size and the DSSD D5, see Tuning the HDFS Block Size for DSSD Mode .

        4. Set the Java Heap Size of Secondary NameNode in Bytes parameter to the same value as the Java Heap Size of NameNode in Bytes parameter.
        5. Restart the NameNode:
          1. Choose Clusters > HDFS > Instances.
          2. In the table of roles, select the NameNode (Active) and SecondaryNameNode role types.
          3. Click Actions for Selected > Restart.

      (Optional) Disable Short Circuit Reads for HBase and Impala

        Important:

      Enabling short-circuit reads for HBase or Impala on an HDFS cluster that uses DSSD D5 DataNodes requires that the processes associated with these applications be granted hdfs group membership. When short-circuit reads are enabled for Impala (for example), Impala process that act as short-circuit read clients (like impalad) are able to read and write all data stored in the DSSD D5. Cloudera Manager applies the hdfs group membership on a per-service basis, and applications that do not require short-circuit reads or for which short-circuit reads have not been enabled will have the same granularity of access control as present on a traditional HDFS cluster. Whether short-circuit reads are enabled or not, access control that is enforced by the application rather than at the file system level is identical for DSSD D5 DataNode HDFS clusters and traditional HDFS clusters.

      Short-circuit reads improve the performance of applications when enabled, but it is not required and can be disabled if the coarser file system access control permissions it implies are problematic.

      Short-circuit reads are enabled for HBase and Impala by default. To disable short-circuit reads for use with DSSD D5 DataNodes:

      To disable short circuit reads for HBase:
      1. In the Cloudera Manager Admin Console, select Clusters > HBase > Configuration.
      2. Type “short” in the Search box.

        A set of short-circuit read parameters for HBase display.

      3. Clear the Enable DSSD Short-Circuit Read property.
      4. Click Save Changes to commit the changes.

        The Admin console indicates that there is a stale configuration.

      5. Restart the stale services as indicated. See Stale Configurations.
      To disable short-circuit reads for Impala:
      1. In the Cloudera Manager Admin Console, select Clusters > Impala > Configuration.
      2. Type “short” in the Search box.

        A set of short-circuit read parameters for Impala display.

      3. Clear the Enable DSSD Short-Circuit Read property.
      4. Click Save Changes to commit the changes.

        The Admin console now indicates that there is a stale configuration.

      5. Restart the stale services as indicated. See Stale Configurations.

      (Optional) Change the Cloudera Manager User

      After configuring your services, the installation wizard automatically starts the Cloudera Management Service, assuming that it runs using cloudera-scm. If you configured this service to run using a user other than cloudera-scm, the Cloudera Management Service roles do not start automatically. To change the service configuration to use the user account that you selected:
      1. Connect to the Cloudera Manager Admin Console.
      2. Do one of the following:
        • Select Clusters > Cloudera Management Service > Cloudera Management Service.
        • On the Home > Status tab, in Cloudera Management Service table, click the Cloudera Management Service link.
      3. Click the Configuration tab.
      4. Use the search box to find the property to change. For example, you might enter "system" to find the System User and System Group properties.
      5. Make any changes required to the System User and System Group to ensure Cloudera Manager uses the proper user accounts.
      6. Click Save Changes.
      7. Start the Cloudera Management Service roles.

      Change the Default Administrator Password

      As soon as possible, change the default administrator password:
      1. Click the logged-in username at the far right of the top navigation bar and select Change Password.
      2. Enter the current password and a new password twice, and then click OK.

      Configure Oozie Data Purge Settings

      If you added an Oozie service, you can change your Oozie configuration to control when data is purged to improve performance, cut down on database disk usage, or to keep the history for a longer period of time. Limiting the size of the Oozie database can also improve performance during upgrades. See Configuring Oozie Data Purge Settings Using Cloudera Manager.

      (Optional) Install Multiple DSSD D5 Appliances in a Cluster

        Note: The steps in this section allow you to map DataNode hosts to racks based on how the hosts are connected to the DSSD D5 and assume that all the previous installation steps on this page have been completed.

      To increase capacity and performance, you can configure a cluster that uses multiple DSSD D5 storage appliances. You configure the cluster by assigning all hosts connected to a DSSD D5 appliance to a single "rack" and select one of three modes to provide policies used by the NameNode to satisfy the configured replication factor. If you are only configuring a single DSSD D5 appliance, skip this section.

      You can also move hosts between appliances. See Moving Existing Hosts to a New DSSD D5

      To configure a cluster to use multiple DSSD D5 appliances:
      1. Stop the HDFS service. Go to the HDFS service and select Actions > Stop.
      2. Assign the hosts attached to each DSSD D5 to a single rack ID. All hosts attached to a D5 should have the same rack assignment and each DSSD D5 should have a unique rack ID. See Specifying Racks for Hosts.
          Important: Each D5 can connect to up to 48 hosts, which is more than most server racks can accommodate. Even though the hosts are physically located in different racks, you must still assign all hosts connected to a D5 to the same rack ID in Cloudera Manager.
      3. Go to the HDFS service, select the Configuration tab, and search for the Block Replica Placement Policy property.
      4. Set the value of the Block Replica Placement Policy property to one of the following values:
        HDFS Default
        Places the first replica on the node where the client process writing the block resides, the second replica on a randomly-chosen remote rack, and a third on a randomly-chosen host in the same remote rack (assuming a replication factor of 3). This ordering is fixed.
        Maximize Capacity
        Places all replicas on the same rack and uses all the capacity of the DSSD D5 for HDFS. If there are fewer DataNode hosts than the configured replication factor, blocks are under-replicated. To avoid under-replication, make sure that there are more DataNodes than the replication factor.
        Maximize Availability
        Places replicas in as many racks as needed to meet the configured replication factor. After replicas have been placed on all available racks, additional replicas are placed randomly across the available racks. If there are fewer DataNode hosts than the configured replication factor, blocks are under-replicated. To avoid under-replication, make sure that there are more DataNodes than the replication factor.
      5. Perform a Rolling Restart on the cluster. Select Clusters > Cluster Name > Actions > Rolling Restart.
        Note: The replication factor is a configuration you set on the HDFS service. To set or view this configuration, go to the HDFS service and select Configuration > Replication Factor.

      Test the Installation

      You can test the installation following the instructions in Testing the Installation.

      Page generated July 8, 2016.