WholeStageCodegenExec Unary Operator with Java Code Generation

WholeStageCodegenExec is a unary physical operator (i.e. with one child physical operator) for a codegened pipeline of a physical query plan, i.e. a physical operator and its children.

WholeStageCodegenExec is created when CollapseCodegenStages physical preparation rule transforms a physical plan and spark.sql.codegen.wholeStage is enabled.

Note
spark.sql.codegen.wholeStage property is enabled by default.

WholeStageCodegenExec supports Java code generation (aka codegen).

WholeStageCodegenExec is marked with * prefix in the tree output of a physical plan.

Note
Use executedPlan phase of a query execution to see WholeStageCodegenExec in the plan.
val q = spark.range(9)
val plan = q.queryExecution.executedPlan
scala> println(plan.numberedTreeString)
00 *Range (0, 9, step=1, splits=8)
Table 1. WholeStageCodegenExec’s Performance Metrics
Key Name (in web UI) Description

pipelineTime

duration

spark sql WholeStageCodegenExec webui.png
Figure 1. WholeStageCodegenExec in web UI (Details for Query)
Tip
Use explain operator to know the physical plan of a query and find out whether or not WholeStageCodegen is in use.
val q = spark.range(10).where('id === 4)
// Note the stars in the output that are for codegened operators
scala> q.explain
== Physical Plan ==
*Filter (id#0L = 4)
+- *Range (0, 10, step=1, splits=8)
Tip
Consider using Debugging Query Execution facility to deep dive into whole stage codegen.
scala> q.queryExecution.debug.codegen
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*Filter (id#5L = 4)
+- *Range (0, 10, step=1, splits=8)
Note
Physical plans that support code generation extend CodegenSupport.
Tip

Enable DEBUG logging level for org.apache.spark.sql.execution.WholeStageCodegenExec logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.sql.execution.WholeStageCodegenExec=DEBUG

Refer to Logging.

doConsume Method

doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String
Note
doConsume is a part of CodegenSupport Contract to generate Java source code for…​FIXME.

doConsume…​FIXME

Executing WholeStageCodegenExec — doExecute Method

doExecute(): RDD[InternalRow]
Note
doExecute is a part of SparkPlan Contract to produce the result of a structured query as an RDD of internal binary rows.

doExecute generates the Java code that is compiled right afterwards.

If compilation fails and spark.sql.codegen.fallback is enabled, you should see the following WARN message in the logs and doExecute returns the result of executing the child physical operator.

WARN WholeStageCodegenExec: Whole-stage codegen disabled for this plan:
[tree]

If however code generation and compilation went well, doExecute branches off per the number of input RDDs.

Note
doExecute only supports up to two input RDDs.
Caution
FIXME

Generating Java Code for Child Subtree — doCodeGen Method

doCodeGen(): (CodegenContext, CodeAndComment)
Caution
FIXME

You should see the following DEBUG message in the logs:

DEBUG WholeStageCodegenExec:
[cleanedSource]
Note
doCodeGen is used when WholeStageCodegenExec doExecute (and for debugCodegen).

results matching ""

    No results matching ""