QueryPlanner — From Logical to Physical Plans

QueryPlanner transforms a logical query through a chain of GenericStrategy objects to produce a physical execution plan, i.e. SparkPlan for SparkPlanner or the Hive-Specific SparkPlanner.

QueryPlanner Contract

QueryPlanner contract defines the following operations:

Protected collectPlaceholders and prunePlans are supposed to be defined by subclasses and are used in the concrete plan method.

strategies Method

strategies: Seq[GenericStrategy[PhysicalPlan]]

strategies abstract method returns a collection of GenericStrategy objects (that are used in plan method).

plan Method

plan(plan: LogicalPlan): Iterator[PhysicalPlan]

plan returns an Iterator[PhysicalPlan] with elements being the result of applying each GenericStrategy object from strategies collection to plan input parameter.

collectPlaceholders Method

collectPlaceholders(plan: PhysicalPlan): Seq[(PhysicalPlan, LogicalPlan)]

collectPlaceholders returns a collection of pairs of a given physical and a corresponding logical plans.

prunePlans Method

prunePlans(plans: Iterator[PhysicalPlan]): Iterator[PhysicalPlan]

prunePlans prunes bad physical plans.

SparkStrategies — Container of SparkStrategy Strategies

SparkStrategies is an abstract base QueryPlanner (of SparkPlan) that serves as a "container" (or a namespace) of the concrete SparkStrategy objects:

  1. SpecialLimits

  2. JoinSelection

  3. StatefulAggregationStrategy

  4. Aggregation

  5. InMemoryScans

  6. StreamingRelationStrategy

  7. BasicOperators

  8. DDLStrategy

Strategy is a type alias of SparkStrategy that is defined in org.apache.spark.sql package object.
SparkPlanner is the one and only concrete implementation of SparkStrategies.
FIXME What is singleRowRdd for?

