This is the documentation for Cloudera Enterprise 5.8.x. Documentation for other versions is available at Cloudera Documentation.

Configuring the Flume Solr Sink

This topic describes how to configure Flume Solr Sink for both parcel-based and package-based installations::
  • For parcel-based installations, use Cloudera Manager to edit the configuration files similar to the process described in Configuring the Flume Agents.
  • For package-based installations, use command-line tools to edit files.
  1. Modify the Flume configuration to specify the Flume source details and set up the flow. You must set the relative or absolute path to the morphline configuration file.
    • Parcel-based Installation: In the Cloudera Manager Admin Console, select Flume > Configuration and modify Configuration File to include:
      agent.sinks.solrSink.morphlineFile = /opt/cloudera/parcels/CDH/etc/flume-ng/conf/morphline.conf
    • Package-based Installation: Edit /etc/flume-ng/conf/flume.conf to include:
      agent.sinks.solrSink.morphlineFile = /etc/flume-ng/conf/morphline.conf
  2. Use a SOLR_LOCATOR to modify the Morphline configuration to specify the Solr location details.
    • Parcel-based Installation: In the Cloudera Manager Admin Console, select Flume > Configuration and modify Morphline File.
    • Package-based Installation: Edit /etc/flume-ng/conf/morphline.conf.
    The snippet that includes the SOLR_LOCATOR might appear as follows:
      # Name of solr collection
      collection : collection
      # ZooKeeper ensemble
      zkHost : "$ZK_HOST"
    morphlines : [
        id : morphline1
        importCommands : ["org.kitesdk.**", "org.apache.solr.**"]
        commands : [
          { generateUUID { field : id } }
          { # Remove record fields that are unknown to Solr schema.xml.
            # Recall that Solr throws an exception on any attempt to load a document that
            # contains a field that isn't specified in schema.xml.
            sanitizeUnknownSolrFields {
              solrLocator : ${SOLR_LOCATOR} # Location from which to fetch Solr schema
          { logDebug { format : "output record: {}", args : ["@{}"] } }
            loadSolr {
              solrLocator : ${SOLR_LOCATOR}
  3. Copy to
    • Parcel-based Installation:
      $ sudo cp /opt/cloudera/parcels/CDH/etc/flume-ng/conf/ \
    • Package-based Installation:
      $ sudo cp /etc/flume-ng/conf/ \
  4. Update the Java heap size.
    • Parcel-based Installation: In the Cloudera Manager Admin Console, select Flume > Configuration. In the Search box enter Java Heap Size. Modify Java Heap Size of Agent in Bytes to be 500 and choose MiB units.
    • Package-based Installation: Edit /etc/flume-ng/conf/ or /opt/cloudera/parcels/CDH/etc/flume-ng/conf/, inserting or replacing JAVA_OPTS as follows:
  5. (Optional) Modify Flume logging settings to facilitate monitoring and debugging:
    • Parcel-based Installation: In the Cloudera Manager Admin Console, select Flume > Configuration and modify Agent Logging Advanced Configuration Snippet (Safety Valve) to include:
    • Package-based Installation: Use the following commands:
      $ sudo bash -c 'echo "" >> \
      $ sudo bash -c 'echo "" >> \
  6. (Optional) In a packaged-based installation, you can use SEARCH_HOME to configure where Flume finds Cloudera Search dependencies for Flume Solr Sink. For example, if you installed Flume from a tarball package, you can configure it to find required files by setting SEARCH_HOME. To set SEARCH_HOME use a command similar to the following::
    $ export SEARCH_HOME=/usr/lib/search

    Alternatively, you can add the same setting to

    In a Cloudera Manager managed environment, Cloudera Manager automatically updates the SOLR_HOME location with any additional required dependencies.

Page generated July 8, 2016.