SparkPlan — Physical Execution Plan

SparkPlan is the base QueryPlan for physical operators to build physical execution plan of a structured query (which is also modelled as…​a Dataset!).

The SparkPlan contract assumes that concrete physical operators define doExecute method which is executed when the final execute is called.

Note
The final execute is triggered when the QueryExecution (of a Dataset) is requested for a RDD[InternalRow].

When executed, a SparkPlan produces RDDs of InternalRow (i.e. RDD[InternalRow]s).

Caution
FIXME SparkPlan is Serializable. Why?
Note
The naming convention for physical operators in Spark’s source code is to have their names end with the Exec prefix, e.g. DebugExec or LocalTableScanExec.
Tip
Read InternalRow about the internal binary row format.
Table 1. SparkPlan’s Atributes
Name Description

metadata

metrics

outputPartitioning

outputOrdering

SparkPlan has the following final methods that prepare environment and pass calls on to corresponding methods that constitute SparkPlan Contract:

Table 2. Physical Query Operators / Specialized SparkPlans
Name Description

UnaryExecNode

LeafExecNode

BinaryExecNode

waitForSubqueries Method

Caution
FIXME

executeCollect Method

Caution
FIXME

SparkPlan Contract

SparkPlan contract requires that concrete physical operators (aka physical plans) define their own custom doExecute.

doExecute(): RDD[InternalRow]
Table 3. Optional Methods
Name Description

doPrepare

Prepares execution

doExecuteBroadcast

Caution
FIXME Why are there two executes?

Executing Query in Scope (after Preparations) — executeQuery Final Method

executeQuery[T](query: => T): T

executeQuery executes query in a scope (i.e. so that all RDDs created will have the same scope).

Internally, executeQuery calls prepare and waitForSubqueries before executing query.

Note
executeQuery is executed as part of execute, executeBroadcast and when CodegenSupport produces a Java source code.

Computing Query Result As Broadcast Variable — executeBroadcast Final Method

executeBroadcast[T](): broadcast.Broadcast[T]

executeBroadcast returns the results of the query as a broadcast variable.

Internally, executeBroadcast executes doExecuteBroadcast inside executeQuery.

Note
executeBroadcast is executed in BroadcastHashJoinExec, BroadcastNestedLoopJoinExec and ReusedExchangeExec.

SQLMetric

SQLMetric is an accumulator that accumulates and produces long metrics values.

There are three known SQLMetrics:

  • sum

  • size

  • timing

metrics Lookup Table

metrics: Map[String, SQLMetric] = Map.empty

metrics is a private[sql] lookup table of supported SQLMetrics by their names.

results matching ""

    No results matching ""