Preparing to Index Data with Cloudera Search
To prepare for indexing example data with MapReduce or Flume, complete the following steps
- Start a SolrCloud cluster containing two servers (this example uses two shards) as described in Deploying Cloudera Search. Stop and continue with the next step after you verify the Runtime Solr Configuration.
- Generate the configuration files for the collection, including the tweet specific schema.xml:
- Parcel-based Installation:
$ solrctl instancedir --generate $HOME/solr_configs2 $ cp /opt/cloudera/parcels/CDH/share/doc/search*/examples/solr-nrt/collection1/conf/schema.xml \ $HOME/solr_configs2/conf
- Package-based Installation:
$ solrctl instancedir --generate $HOME/solr_configs2 $ cp /usr/share/doc/search*/examples/solr-nrt/collection1/conf/schema.xml \ $HOME/solr_configs2/conf
- Parcel-based Installation:
- Upload the instance directory to ZooKeeper:
$ solrctl instancedir --create collection1 $HOME/solr_configs2/
- Create the new collection:
$ solrctl collection --create collection1 -s 2 -c collection1
- Verify the collection is live. For example, for the localhost, use http://localhost:8983/solr/#/~cloud.
- Prepare the configuration for use with MapReduce:
$ cp -r $HOME/solr_configs2 $HOME/collection1
- Locate input files suitable for indexing, and check that the directory exists. This example assumes you are running the following commands as $USER with
access to HDFS.
- Parcel-based Installation:
$ sudo -u hdfs hadoop fs -mkdir -p /user/$USER $ sudo -u hdfs hadoop fs -chown $USER:$USER /user/$USER $ hadoop fs -mkdir -p /user/$USER/indir $ hadoop fs -copyFromLocal \ /opt/cloudera/parcels/CDH/share/doc/search*/examples/test-documents/sample-statuses-*.avro \ /user/$USER/indir/ $ hadoop fs -ls /user/$USER/indir
- Package-based Installation:
$ sudo -u hdfs hadoop fs -mkdir -p /user/$USER $ sudo -u hdfs hadoop fs -chown $USER:$USER /user/$USER $ hadoop fs -mkdir -p /user/$USER/indir $ hadoop fs -copyFromLocal \ /usr/share/doc/search*/examples/test-documents/sample-statuses-*.avro \ /user/$USER/indir/ $ hadoop fs -ls /user/$USER/indir
- Parcel-based Installation:
- Ensure that outdir is empty and exists in HDFS:
$ hadoop fs -rm -r -skipTrash /user/$USER/outdir $ hadoop fs -mkdir /user/$USER/outdir $ hadoop fs -ls /user/$USER/outdir
- Depending on your installation mechanism for the Hadoop cluster, collect HDFS/MapReduce configuration details by downloading them from Cloudera Manager or by using /etc/hadoop, . This example uses the configuration in /etc/hadoop/conf.cloudera.mapreduce1. Substitute the correct Hadoop configuration path for your cluster.
Page generated July 8, 2016.
<< Validating the Deployment with the Solr REST API | ©2016 Cloudera, Inc. All rights reserved | Using MapReduce Batch Indexing with Cloudera Search >> |
Terms and Conditions Privacy Policy |