RewritePredicateSubquery Logical Optimization

  • Filter operators with Exists and In with ListQuery expressions give left-semi joins

  • Filter operators with Not with Exists and In with ListQuery expressions give left-anti joins

Note
Prefer EXISTS (over Not with In with ListQuery subquery expression) if performance matters since they say "that will almost certainly be planned as a Broadcast Nested Loop join".

RewritePredicateSubquery is part of the RewriteSubquery once-executed batch in the standard batches of the Catalyst Optimizer.

RewritePredicateSubquery is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

// FIXME Examples of RewritePredicateSubquery
// 1. Filters with Exists and In (with ListQuery) expressions
// 2. NOTs

// Based on RewriteSubquerySuite
// FIXME Contribute back to RewriteSubquerySuite
import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
import org.apache.spark.sql.catalyst.rules.RuleExecutor
object Optimize extends RuleExecutor[LogicalPlan] {
  import org.apache.spark.sql.catalyst.optimizer._
  val batches = Seq(
    Batch("Column Pruning", FixedPoint(100), ColumnPruning),
    Batch("Rewrite Subquery", Once,
      RewritePredicateSubquery,
      ColumnPruning,
      CollapseProject,
      RemoveRedundantProject))
}

val q = ...
val optimized = Optimize.execute(q.analyze)

RewritePredicateSubquery is part of the RewriteSubquery once-executed batch in the standard batches of the Catalyst Optimizer.

rewriteExistentialExpr Internal Method

rewriteExistentialExpr(
  exprs: Seq[Expression],
  plan: LogicalPlan): (Option[Expression], LogicalPlan)

rewriteExistentialExpr…​FIXME

Note
rewriteExistentialExpr is used when…​FIXME

dedupJoin Internal Method

dedupJoin(joinPlan: LogicalPlan): LogicalPlan

dedupJoin…​FIXME

Note
dedupJoin is used when…​FIXME

getValueExpression Internal Method

getValueExpression(e: Expression): Seq[Expression]

getValueExpression…​FIXME

Note
getValueExpression is used when…​FIXME

Executing Rule — apply Method

apply(plan: LogicalPlan): LogicalPlan
Note
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).

apply transforms Filter unary operators in the input logical plan.

apply splits conjunctive predicates in the condition expression (i.e. expressions separated by And expression) and then partitions them into two collections of expressions with and without In or Exists subquery expressions.

apply creates a Filter operator for condition (sub)expressions without subqueries (combined with And expression) if available or takes the child operator (of the input Filter unary operator).

In the end, apply creates a new logical plan with Join operators for Exists and In expressions (and their negations) as follows:

results matching ""

    No results matching ""