// FIXME Examples of RewritePredicateSubquery
// 1. Filters with Exists and In (with ListQuery) expressions
// 2. NOTs
// Based on RewriteSubquerySuite
// FIXME Contribute back to RewriteSubquerySuite
import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
import org.apache.spark.sql.catalyst.rules.RuleExecutor
object Optimize extends RuleExecutor[LogicalPlan] {
import org.apache.spark.sql.catalyst.optimizer._
val batches = Seq(
Batch("Column Pruning", FixedPoint(100), ColumnPruning),
Batch("Rewrite Subquery", Once,
RewritePredicateSubquery,
ColumnPruning,
CollapseProject,
RemoveRedundantProject))
}
val q = ...
val optimized = Optimize.execute(q.analyze)
RewritePredicateSubquery Logical Optimization
RewritePredicateSubquery
is a base logical optimization that transforms Filter operators with Exists and In (with ListQuery) expressions to Join operators as follows:
-
Filter
operators withExists
andIn
withListQuery
expressions give left-semi joins -
Filter
operators withNot
withExists
andIn
withListQuery
expressions give left-anti joins
Note
|
Prefer EXISTS (over Not with In with ListQuery subquery expression) if performance matters since they say "that will almost certainly be planned as a Broadcast Nested Loop join".
|
RewritePredicateSubquery
is part of the RewriteSubquery once-executed batch in the standard batches of the Catalyst Optimizer.
RewritePredicateSubquery
is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan]
.
RewritePredicateSubquery
is part of the RewriteSubquery once-executed batch in the standard batches of the Catalyst Optimizer.
rewriteExistentialExpr
Internal Method
rewriteExistentialExpr(
exprs: Seq[Expression],
plan: LogicalPlan): (Option[Expression], LogicalPlan)
rewriteExistentialExpr
…FIXME
Note
|
rewriteExistentialExpr is used when…FIXME
|
dedupJoin
Internal Method
dedupJoin(joinPlan: LogicalPlan): LogicalPlan
dedupJoin
…FIXME
Note
|
dedupJoin is used when…FIXME
|
getValueExpression
Internal Method
getValueExpression(e: Expression): Seq[Expression]
getValueExpression
…FIXME
Note
|
getValueExpression is used when…FIXME
|
Executing Rule — apply
Method
apply(plan: LogicalPlan): LogicalPlan
Note
|
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).
|
apply
transforms Filter unary operators in the input logical plan.
apply
splits conjunctive predicates in the condition expression (i.e. expressions separated by And
expression) and then partitions them into two collections of expressions with and without In or Exists subquery expressions.
apply
creates a Filter operator for condition (sub)expressions without subqueries (combined with And
expression) if available or takes the child operator (of the input Filter
unary operator).
In the end, apply
creates a new logical plan with Join operators for Exists and In expressions (and their negations) as follows:
-
For Exists predicate expressions,
apply
rewriteExistentialExpr and creates a Join operator with LeftSemi join type. In the end,apply
dedupJoin -
For
Not
expressions with a Exists predicate expression,apply
rewriteExistentialExpr and creates a Join operator with LeftAnti join type. In the end,apply
dedupJoin -
For In predicate expressions with a ListQuery subquery expression,
apply
getValueExpression followed by rewriteExistentialExpr and creates a Join operator with LeftSemi join type. In the end,apply
dedupJoin -
For
Not
expressions with a In predicate expression with a ListQuery subquery expression,apply
getValueExpression, rewriteExistentialExpr followed by splitting conjunctive predicates and creates a Join operator with LeftAnti join type. In the end,apply
dedupJoin -
For other predicate expressions,
apply
rewriteExistentialExpr and creates a Project unary operator with a Filter operator