RuleExecutor Contract — Tree Transformation Rule Executor

RuleExecutor is the base of rule executors that are responsible for executing a collection of batches (of rules) to transform a TreeNode.

package org.apache.spark.sql.catalyst.rules

abstract class RuleExecutor[TreeType <: TreeNode[_]] {
  // only required properties (vals and methods) that have no implementation
  // the others follow
  protected def batches: Seq[Batch]
}
Table 1. RuleExecutor Contract
Property Description

batches

batches: Seq[Batch]

Collection of rule batches, i.e. a sequence of a collection of rules with a name and a strategy that RuleExecutor uses when executed

Note
TreeType is the type of the TreeNode implementation that a RuleExecutor can be executed on, i.e. LogicalPlan, SparkPlan, Expression or a combination thereof.
Table 2. RuleExecutors (Direct Implementations)
RuleExecutor Description

Analyzer

Logical query plan analyzer

ExpressionCanonicalizer

Optimizer

Generic logical query plan optimizer

Applying Rule Batches to TreeNode — execute Method

execute(plan: TreeType): TreeType

execute iterates over rule batches and applies rules sequentially to the input plan.

execute tracks the number of iterations and the time of executing each rule (with a plan).

When a rule changes a plan, you should see the following TRACE message in the logs:

TRACE HiveSessionStateBuilder$$anon$1:
=== Applying Rule [ruleName] ===
[currentAndModifiedPlansSideBySide]

After the number of iterations has reached the number of iterations for the batch’s Strategy it stops execution and prints out the following WARN message to the logs:

WARN HiveSessionStateBuilder$$anon$1: Max iterations ([iteration]) reached for batch [batchName]

When the plan has not changed (after applying rules), you should see the following TRACE message in the logs and execute moves on to applying the rules in the next batch. The moment is called fixed point (i.e. when the execution converges).

TRACE HiveSessionStateBuilder$$anon$1: Fixed point reached for batch [batchName] after [iteration] iterations.

After the batch finishes, if the plan has been changed by the rules, you should see the following DEBUG message in the logs:

DEBUG HiveSessionStateBuilder$$anon$1:
=== Result of Batch [batchName] ===
[currentAndModifiedPlansSideBySide]

Otherwise, when the rules had no changes to a plan, you should see the following TRACE message in the logs:

TRACE HiveSessionStateBuilder$$anon$1: Batch [batchName] has no effect.

Rule Batch — Collection of Rules

Batch is a named collection of rules with a strategy.

Batch takes the following when created:

Batch Execution Strategy

Strategy is the base of the batch execution strategies that indicate the maximum number of executions (aka maxIterations).

abstract class Strategy {
  def maxIterations: Int
}
Table 3. Strategies
Strategy Description

Once

A strategy that runs only once (with maxIterations as 1)

FixedPoint

A strategy that runs until fix point (i.e. converge) or maxIterations times, whichever comes first

isPlanIntegral Method

isPlanIntegral(plan: TreeType): Boolean

isPlanIntegral simply returns true.

Note
isPlanIntegral is used exclusively when RuleExecutor is requested to execute.

results matching ""

    No results matching ""