makeRDDForTable(
hiveTable: HiveTable): RDD[InternalRow]
HadoopTableReader
HadoopTableReader is a TableReader to create an HadoopRDD for scanning partitioned or unpartitioned tables stored in Hadoop.
HadoopTableReader is used by HiveTableScanExec physical operator when requested to execute.
Creating HadoopTableReader Instance
HadoopTableReader takes the following to be created:
-
Hive TableDesc
-
Hadoop Configuration
HadoopTableReader initializes the internal properties.
makeRDDForTable Method
|
Note
|
makeRDDForTable is part of the TableReader contract to…FIXME.
|
makeRDDForTable simply calls the private makeRDDForTable with…FIXME
makeRDDForPartitionedTable Method
makeRDDForPartitionedTable(
partitions: Seq[HivePartition]): RDD[InternalRow]
|
Note
|
makeRDDForPartitionedTable is part of the TableReader contract to…FIXME.
|
makeRDDForPartitionedTable simply calls the private makeRDDForPartitionedTable with…FIXME
Creating HadoopRDD — createHadoopRdd Internal Method
createHadoopRdd(
tableDesc: TableDesc,
path: String,
inputFormatClass: Class[InputFormat[Writable, Writable]]): RDD[Writable]
createHadoopRdd initializeLocalJobConfFunc for the input path and tableDesc.
createHadoopRdd creates an HadoopRDD (with the broadcast Hadoop Configuration, the input inputFormatClass, and the minimum number of partitions) and takes (maps over) the values.
|
Note
|
createHadoopRdd adds a HadoopRDD and a MapPartitionsRDD to a RDD lineage.
|
|
Note
|
createHadoopRdd is used when HadoopTableReader is requested to makeRDDForTable and makeRDDForPartitionedTable.
|
initializeLocalJobConfFunc Utility
initializeLocalJobConfFunc(
path: String,
tableDesc: TableDesc)(
jobConf: JobConf): Unit
initializeLocalJobConfFunc…FIXME
|
Note
|
initializeLocalJobConfFunc is used when HadoopTableReader is requested to create an HadoopRDD.
|
Internal Properties
| Name | Description |
|---|---|
|
Hadoop Configuration broadcast to executors |
|
Minimum number of partitions for a HadoopRDD:
|