This is the documentation for Cloudera Enterprise 5.8.x. Documentation for other versions is available at Cloudera Documentation.

Preparing to Index Data with Cloudera Search

To prepare for indexing example data with MapReduce or Flume, complete the following steps

Start a SolrCloud cluster containing two servers (this example uses two shards) as described in Deploying Cloudera Search. Stop and continue with the next step after you verify the Runtime Solr Configuration.

Generate the configuration files for the collection, including the tweet specific schema.xml:

Parcel-based Installation:

$ solrctl instancedir --generate $HOME/solr_configs2
$ cp /opt/cloudera/parcels/CDH/share/doc/search*/examples/solr-nrt/collection1/conf/schema.xml \
$HOME/solr_configs2/conf

Package-based Installation:

$ solrctl instancedir --generate $HOME/solr_configs2
$ cp /usr/share/doc/search*/examples/solr-nrt/collection1/conf/schema.xml \
$HOME/solr_configs2/conf

Upload the instance directory to ZooKeeper:

$ solrctl instancedir --create collection1 $HOME/solr_configs2/

Create the new collection:

$ solrctl collection --create collection1 -s 2 -c collection1

Verify the collection is live. For example, for the localhost, use http://localhost:8983/solr/#/~cloud.
Prepare the configuration for use with MapReduce:
```
$ cp -r $HOME/solr_configs2 $HOME/collection1
```

Locate input files suitable for indexing, and check that the directory exists. This example assumes you are running the following commands as $USER with access to HDFS.

Parcel-based Installation:

$ sudo -u hdfs hadoop fs -mkdir -p /user/$USER
$ sudo -u hdfs hadoop fs -chown $USER:$USER /user/$USER
$ hadoop fs -mkdir -p /user/$USER/indir
$ hadoop fs -copyFromLocal \
/opt/cloudera/parcels/CDH/share/doc/search*/examples/test-documents/sample-statuses-*.avro \
/user/$USER/indir/
$ hadoop fs -ls /user/$USER/indir

Package-based Installation:

$ sudo -u hdfs hadoop fs -mkdir -p /user/$USER
$ sudo -u hdfs hadoop fs -chown $USER:$USER /user/$USER
$ hadoop fs -mkdir -p /user/$USER/indir
$ hadoop fs -copyFromLocal \
/usr/share/doc/search*/examples/test-documents/sample-statuses-*.avro \
/user/$USER/indir/
$ hadoop fs -ls /user/$USER/indir

Ensure that outdir is empty and exists in HDFS:

$ hadoop fs -rm -r -skipTrash /user/$USER/outdir
$ hadoop fs -mkdir /user/$USER/outdir
$ hadoop fs -ls /user/$USER/outdir

Depending on your installation mechanism for the Hadoop cluster, collect HDFS/MapReduce configuration details by downloading them from Cloudera Manager or by using /etc/hadoop, . This example uses the configuration in /etc/hadoop/conf.cloudera.mapreduce1. Substitute the correct Hadoop configuration path for your cluster.

Page generated July 8, 2016.

Categories: Flume | MapReduce | Search | All Categories