Accessing External Storage
Spark can access all storage sources supported by Hadoop, including a local file system, HDFS, HBase, and Amazon S3.
Spark supports many file types, including text files, RCFile, SequenceFile, Hadoop InputFormat, Avro, Parquet, and compression of all supported files.
For more information, see External Storage.
Accessing Compressed Files
You can read compressed files using one of the following methods:
- textFile(path)
- hadoopFile(path,outputFormatClass)
You can save compressed files using one of the following methods:
- saveAsTextFile(path, compressionCodecClass="codec_class")
- saveAsHadoopFile(path,outputFormatClass, compressionCodecClass="codec_class")
For examples of accessing Avro and Parquet files, see Spark with Avro and Parquet.
For details on how to access specific types of external storage and files, see:
Page generated July 8, 2016.
<< Using Spark MLlib | ©2016 Cloudera, Inc. All rights reserved | Accessing Data Stored in Amazon S3 >> |
Terms and Conditions Privacy Policy |