// FIXME Examples of RewritePredicateSubquery
// 1. Filters with Exists and In (with ListQuery) expressions
// 2. NOTs
// Based on RewriteSubquerySuite
// FIXME Contribute back to RewriteSubquerySuite
import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
import org.apache.spark.sql.catalyst.rules.RuleExecutor
object Optimize extends RuleExecutor[LogicalPlan] {
import org.apache.spark.sql.catalyst.optimizer._
val batches = Seq(
Batch("Column Pruning", FixedPoint(100), ColumnPruning),
Batch("Rewrite Subquery", Once,
RewritePredicateSubquery,
ColumnPruning,
CollapseProject,
RemoveRedundantProject))
}
val q = ...
val optimized = Optimize.execute(q.analyze)
RewritePredicateSubquery Logical Optimization
RewritePredicateSubquery is a base logical optimization that transforms Filter operators with Exists and In (with ListQuery) expressions to Join operators as follows:
-
Filteroperators withExistsandInwithListQueryexpressions give left-semi joins -
Filteroperators withNotwithExistsandInwithListQueryexpressions give left-anti joins
|
Note
|
Prefer EXISTS (over Not with In with ListQuery subquery expression) if performance matters since they say "that will almost certainly be planned as a Broadcast Nested Loop join".
|
RewritePredicateSubquery is part of the RewriteSubquery once-executed batch in the standard batches of the Catalyst Optimizer.
RewritePredicateSubquery is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].
RewritePredicateSubquery is part of the RewriteSubquery once-executed batch in the standard batches of the Catalyst Optimizer.
rewriteExistentialExpr Internal Method
rewriteExistentialExpr(
exprs: Seq[Expression],
plan: LogicalPlan): (Option[Expression], LogicalPlan)
rewriteExistentialExpr…FIXME
|
Note
|
rewriteExistentialExpr is used when…FIXME
|
dedupJoin Internal Method
dedupJoin(joinPlan: LogicalPlan): LogicalPlan
dedupJoin…FIXME
|
Note
|
dedupJoin is used when…FIXME
|
getValueExpression Internal Method
getValueExpression(e: Expression): Seq[Expression]
getValueExpression…FIXME
|
Note
|
getValueExpression is used when…FIXME
|
Executing Rule — apply Method
apply(plan: LogicalPlan): LogicalPlan
|
Note
|
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).
|
apply transforms Filter unary operators in the input logical plan.
apply splits conjunctive predicates in the condition expression (i.e. expressions separated by And expression) and then partitions them into two collections of expressions with and without In or Exists subquery expressions.
apply creates a Filter operator for condition (sub)expressions without subqueries (combined with And expression) if available or takes the child operator (of the input Filter unary operator).
In the end, apply creates a new logical plan with Join operators for Exists and In expressions (and their negations) as follows:
-
For Exists predicate expressions,
applyrewriteExistentialExpr and creates a Join operator with LeftSemi join type. In the end,applydedupJoin -
For
Notexpressions with a Exists predicate expression,applyrewriteExistentialExpr and creates a Join operator with LeftAnti join type. In the end,applydedupJoin -
For In predicate expressions with a ListQuery subquery expression,
applygetValueExpression followed by rewriteExistentialExpr and creates a Join operator with LeftSemi join type. In the end,applydedupJoin -
For
Notexpressions with a In predicate expression with a ListQuery subquery expression,applygetValueExpression, rewriteExistentialExpr followed by splitting conjunctive predicates and creates a Join operator with LeftAnti join type. In the end,applydedupJoin -
For other predicate expressions,
applyrewriteExistentialExpr and creates a Project unary operator with a Filter operator