HadoopTableReader

HadoopTableReader is a TableReader to create an HadoopRDD for scanning partitioned or unpartitioned tables stored in Hadoop.

HadoopTableReader is used by HiveTableScanExec physical operator when requested to execute.

Creating HadoopTableReader Instance

HadoopTableReader takes the following to be created:

HadoopTableReader initializes the internal properties.

makeRDDForTable Method

makeRDDForTable(
  hiveTable: HiveTable): RDD[InternalRow]
Note
makeRDDForTable is part of the TableReader contract to…​FIXME.

makeRDDForTable simply calls the private makeRDDForTable with…​FIXME

makeRDDForTable Method

makeRDDForTable(
  hiveTable: HiveTable,
  deserializerClass: Class[_ <: Deserializer],
  filterOpt: Option[PathFilter]): RDD[InternalRow]

makeRDDForTable…​FIXME

Note
makeRDDForTable is used when…​FIXME

makeRDDForPartitionedTable Method

makeRDDForPartitionedTable(
  partitions: Seq[HivePartition]): RDD[InternalRow]
Note
makeRDDForPartitionedTable is part of the TableReader contract to…​FIXME.

makeRDDForPartitionedTable simply calls the private makeRDDForPartitionedTable with…​FIXME

makeRDDForPartitionedTable Method

makeRDDForPartitionedTable(
  partitionToDeserializer: Map[HivePartition, Class[_ <: Deserializer]],
  filterOpt: Option[PathFilter]): RDD[InternalRow]

makeRDDForPartitionedTable…​FIXME

Note
makeRDDForPartitionedTable is used when…​FIXME

Creating HadoopRDD — createHadoopRdd Internal Method

createHadoopRdd(
  tableDesc: TableDesc,
  path: String,
  inputFormatClass: Class[InputFormat[Writable, Writable]]): RDD[Writable]

createHadoopRdd initializeLocalJobConfFunc for the input path and tableDesc.

createHadoopRdd creates an HadoopRDD (with the broadcast Hadoop Configuration, the input inputFormatClass, and the minimum number of partitions) and takes (maps over) the values.

Note
createHadoopRdd adds a HadoopRDD and a MapPartitionsRDD to a RDD lineage.
Note
createHadoopRdd is used when HadoopTableReader is requested to makeRDDForTable and makeRDDForPartitionedTable.

initializeLocalJobConfFunc Utility

initializeLocalJobConfFunc(
  path: String,
  tableDesc: TableDesc)(
    jobConf: JobConf): Unit

initializeLocalJobConfFunc…​FIXME

Note
initializeLocalJobConfFunc is used when HadoopTableReader is requested to create an HadoopRDD.

Internal Properties

Name Description

_broadcastedHadoopConf

Hadoop Configuration broadcast to executors

_minSplitsPerRDD

Minimum number of partitions for a HadoopRDD:

  • 0 for local mode

  • The greatest of Hadoop’s mapreduce.job.maps (default: 1) and Spark Core’s default minimum number of partitions for Hadoop RDDs (not higher than 2)

results matching ""

    No results matching ""