HiveTableScanExec Leaf Physical Operator

HiveTableScanExec is a leaf physical operator that represents a HiveTableRelation logical operator at execution time.

HiveTableScanExec is created exclusively when HiveTableScans execution planning strategy plans a HiveTableRelation logical operator (i.e. is executed on a logical query plan with a HiveTableRelation logical operator).

HiveTableScanExec uses the fully-qualified name of the Hive table (of the HiveTableRelation) for the node name:

Scan hive [table]

Creating HiveTableScanExec Instance

HiveTableScanExec takes the following when created:

HiveTableScanExec initializes the internal registries and counters.

Partition Pruning Predicates

HiveTableScanExec physical operator supports partition pruning for Hive tables that are partitioned.

HiveTableScanExec requires that either the partitionPruningPred has no expressions or the HiveTableRelation is partitioned. Otherwise, HiveTableScanExec throws an IllegalArgumentException.

HiveTableScans execution planning strategy creates a HiveTableScanExec physical operator for every HiveTableRelation operator in a query plan. When created, HiveTableScanExec is given the partition pruning predicates that are predicate expressions with no references and among the partition columns of the HiveTableRelation.

Performance Metrics — metrics Method

Table 1. HiveTableScanExec’s Performance Metrics
Key Name (in web UI) Description

numOutputRows

number of output rows

Executing Physical Operator (Generating RDD[InternalRow]) — doExecute Method

doExecute(): RDD[InternalRow]
Note
doExecute is part of SparkPlan contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. RDD[InternalRow]).

doExecute…​FIXME

Internal Properties

Name Description

boundPruningPred

Catalyst expression for the partitionPruningPred bound to (the partitionCols of) the HiveTableRelation

hiveQlTable

Hive Table metadata (converted from the CatalogTable of the HiveTableRelation)

Used when HiveTableScanExec is requested for the tableDesc, rawPartitions and is executed

hadoopReader

rawPartitions

Hive partitions (Seq[Partition])

Used when HiveTableScanExec physical operator is executed with a partitioned table

tableDesc

Hive TableDesc

results matching ""

    No results matching ""