Schema
A table schema contains information about the names and types of the columns of a table.
A Kite dataset ingested into HDFS contains information about the names and types of the fields in an HDFS Avro or Parquet file used to create the dataset.
Displaying Hive, Impala, and Sqoop Table Schema
- Perform a metadata search for entities of source type Hive and type Table.
- In the list of results, click a result entry. The table schema displays in the Details tab.
Displaying Pig Table Schema
- Perform a metadata search for entities of source type Pig.
- In the list of results, click a result entry of type Table. The table schema displays in the Details tab.
Displaying HDFS Dataset Schema
If you ingest a Kite dataset into HDFS, you can view the schema of the dataset. The schema is represented as an entity of type Dataset and is implemented as an HDFS directory.
For Avro datasets, primitive types such as null, string, int, and so on, are not separate entities. For example, if you have a record type with a field A that's a record type and a field B that's a string, the subfields of A become entities themselves, but B has no children. Another example would be if you had a union of null, string, map, array, and record types; the union has 3 children - the map, array, and record subtypes.
To display an HDFS dataset schema:
- Perform a metadata search for entities of type Dataset.
- Click a result entry. The dataset schema displays in the Details tab.
Stocks Schema
- Use the Stocks Avro schema file:
{ "type" : "record", "name" : "Stocks", "namespace" : "com.example.stocks", "doc" : "Schema generated by Kite", "fields" : [ { "name" : "Symbol", "type" : [ "null", "string" ], "doc" : "Type inferred from 'AAIT'" }, { "name" : "Date", "type" : [ "null", "string" ], "doc" : "Type inferred from '28-Oct-2014'" }, { "name" : "Open", "type" : [ "null", "double" ], "doc" : "Type inferred from '33.1'" }, { "name" : "High", "type" : [ "null", "double" ], "doc" : "Type inferred from '33.13'" }, { "name" : "Low", "type" : [ "null", "double" ], "doc" : "Type inferred from '33.1'" }, { "name" : "Close", "type" : [ "null", "double" ], "doc" : "Type inferred from '33.13'" }, { "name" : "Volume", "type" : [ "null", "long" ], "doc" : "Type inferred from '400'" } ] }
and the kite-dataset command to create a Stocks dataset:kite-dataset create dataset:hdfs:/user/hdfs/Stocks -s Stocks.avsc
The following directory is created in HDFS:
- In search results, the Stocks dataset appears as follows:
- Click the Stocks link. The schema displays at the right of the Details tab.
Each subfield of the Stocks record is an entity of type Field.
- Then use the kite-dataset command to import structured data:
kite-dataset csv-import ./Stocks.csv dataset:hdfs:/user/hdfs/Stocks --no-header
where Stocks.csv is:AAPL,20150206,120.02,120.25,118.45,118.93,43372000 AAPL,20150205,120.02,120.23,119.25,119.94,42246200 GOOG,20150304,571.87,577.11,568.01,573.37,1713800 GOOG,20150303,570.45,575.39,566.52,573.64,1694300 GOOG,20150302,560.53,572.15,558.75,571.34,2118400 GOOG,20150209,528,532,526.02,527.83,1264300 GOOG,20150206,527.64,537.2,526.41,531,1744600 GOOG,20150205,523.79,528.5,522.09,527.58,1844700 FB,20150304,79.3,81.15,78.85,80.9,28014500 FB,20150303,79.61,79.7,78.52,79.6,18567300 FB,20150302,79,79.86,78.52,79.75,21604400 FB,20150227,80.68,81.23,78.62,78.97,30635700 FB,20150226,79.88,81.37,79.72,80.41,31111900 TWTR,20150211,46.27,47.78,46.11,47.5,24747000 TWTR,20150210,47.35,47.39,45.57,46.26,32287800 TWTR,20150209,46.73,47.69,46.5,47.32,36177900 TWTR,20150206,46.12,48.5,45.8,48.01,102669800 TWTR,20150205,42.04,42.47,40.91,41.26,61997300 MSFT,20150304,43.01,43.21,42.88,43.06,25705800 MSFT,20150303,43.56,43.83,43.09,43.28,31748600 MSFT,20150302,43.67,44.19,43.55,43.88,31924000 MSFT,20150227,44.13,44.2,43.66,43.85,33807700 MSFT,20150226,43.99,44.23,43.89,44.06,28957300 ORCL,20150304,43.2,43.66,42.82,43.61,14663900 ORCL,20150303,43.83,43.88,43.17,43.38,10058700 ORCL,20150302,43.81,44.04,43.48,44.03,11091000 ORCL,20150227,43.77,44.11,43.68,43.82,9549500 ORCL,20150226,43.8,44.15,43.71,43.89,8519300 ORCL,20150225,43.83,44.09,43.38,43.73,11785400
<< Cloudera Navigator Lineage Diagrams | ©2016 Cloudera, Inc. All rights reserved | Cloudera Operation >> |
Terms and Conditions Privacy Policy |