OptimizeIn Logical Optimization

  1. Replaces an In expression that has an empty list and the value expression not nullable to false

  2. Eliminates duplicates of Literal expressions in an In predicate expression that is inSetConvertible

  3. Replaces an In predicate expression that is inSetConvertible with InSet expressions when the number of literal expressions in the list expression is greater than spark.sql.optimizer.inSetConversionThreshold internal configuration property (default: 10)

OptimizeIn is part of the Operator Optimization before Inferring Filters fixed-point batch in the standard batches of the Catalyst Optimizer.

OptimizeIn is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

// Use Catalyst DSL to define a logical plan

// HACK: Disable symbolToColumn implicit conversion
// It is imported automatically in spark-shell (and makes demos impossible)
// implicit def symbolToColumn(s: Symbol): org.apache.spark.sql.ColumnName
trait ThatWasABadIdea
implicit def symbolToColumn(ack: ThatWasABadIdea) = ack

import org.apache.spark.sql.catalyst.dsl.expressions._
import org.apache.spark.sql.catalyst.dsl.plans._

import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
val rel = LocalRelation('a.int, 'b.int, 'c.int)

import org.apache.spark.sql.catalyst.expressions.{In, Literal}
val plan = rel
  .where(In('a, Seq[Literal](1, 2, 3)))
scala> println(plan.numberedTreeString)
00 Filter a#6 IN (1,2,3)
01 +- LocalRelation <empty>, [a#6, b#7, c#8]

// In --> InSet
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", 0)

import org.apache.spark.sql.catalyst.optimizer.OptimizeIn
val optimizedPlan = OptimizeIn(plan)
scala> println(optimizedPlan.numberedTreeString)
00 Filter a#6 INSET (1,2,3)
01 +- LocalRelation <empty>, [a#6, b#7, c#8]

Executing Rule — apply Method

apply(plan: LogicalPlan): LogicalPlan
apply is part of the Rule Contract to execute (apply) a rule on a TreeNode (e.g. LogicalPlan).


results matching ""

    No results matching ""