ProjectExec Unary Physical Operator

ProjectExec is a unary physical operator (i.e. with one child physical operator) that…​FIXME

ProjectExec supports Java code generation (aka codegen).

ProjectExec is created when:


The following is the order of applying the above execution planning strategies to logical query plans when SparkPlanner or Hive-specific SparkPlanner are requested to plan a logical query plan into one or more physical query plans:

Executing Physical Operator (Generating RDD[InternalRow]) — doExecute Method

doExecute(): RDD[InternalRow]
doExecute is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. RDD[InternalRow]).

doExecute requests the input child physical plan to produce an RDD of internal rows and applies a calculation over indexed partitions (using RDD.mapPartitionsWithIndexInternal).

  f: (Int, Iterator[T]) => Iterator[U],
  preservesPartitioning: Boolean = false)

Inside doExecute (RDD.mapPartitionsWithIndexInternal)

Inside the function (that is part of RDD.mapPartitionsWithIndexInternal), doExecute creates an UnsafeProjection with the following:

  1. Named expressions

  2. Output of the child physical operator as the input schema

  3. subexpressionEliminationEnabled flag

doExecute requests the UnsafeProjection to initialize and maps over the internal rows (of a partition) using the projection.

Creating ProjectExec Instance

ProjectExec takes the following when created:

Generating Java Source Code for Consume Path in Whole-Stage Code Generation — doConsume Method

doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String
doConsume is part of CodegenSupport Contract to generate the Java source code for consume path in Whole-Stage Code Generation.


results matching ""

    No results matching ""