doExecute(): RDD[InternalRow]
ProjectExec Unary Physical Operator
ProjectExec is a unary physical operator (i.e. with one child physical operator) that…FIXME
ProjectExec supports Java code generation (aka codegen).
ProjectExec is created when:
-
InMemoryScans and HiveTableScans execution planning strategies are executed (and request
SparkPlannerto pruneFilterProject) -
BasicOperatorsexecution planning strategy is requested to resolve a Project logical operator -
DataSourceStrategyexecution planning strategy is requested to creates a RowDataSourceScanExec -
FileSourceStrategyexecution planning strategy is requested to plan a LogicalRelation with a HadoopFsRelation -
ExtractPythonUDFsphysical optimization is requested to optimize a physical query plan (and extracts Python UDFs)
|
Note
|
The following is the order of applying the above execution planning strategies to logical query plans when |
Executing Physical Operator (Generating RDD[InternalRow]) — doExecute Method
|
Note
|
doExecute is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. RDD[InternalRow]).
|
doExecute requests the input child physical plan to produce an RDD of internal rows and applies a calculation over indexed partitions (using RDD.mapPartitionsWithIndexInternal).
mapPartitionsWithIndexInternal[U](
f: (Int, Iterator[T]) => Iterator[U],
preservesPartitioning: Boolean = false)
Inside doExecute (RDD.mapPartitionsWithIndexInternal)
Inside the function (that is part of RDD.mapPartitionsWithIndexInternal), doExecute creates an UnsafeProjection with the following:
doExecute requests the UnsafeProjection to initialize and maps over the internal rows (of a partition) using the projection.
Creating ProjectExec Instance
ProjectExec takes the following when created:
-
NamedExpressions for the projection
-
Child physical operator
Generating Java Source Code for Consume Path in Whole-Stage Code Generation — doConsume Method
doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String
|
Note
|
doConsume is part of CodegenSupport Contract to generate the Java source code for consume path in Whole-Stage Code Generation.
|
doConsume…FIXME