FilterExec Unary Physical Operator

FilterExec is a unary physical operator (i.e. with one child physical operator) that represents Filter and TypedFilter unary logical operators at execution.

FilterExec supports Java code generation (aka codegen) as follows:

FilterExec is created when:

Table 1. FilterExec’s Performance Metrics
Key Name (in web UI) Description


number of output rows

spark sql FilterExec webui details for query.png
Figure 1. FilterExec in web UI (Details for Query)

FilterExec uses whatever the child physical operator uses for the input RDDs, the outputOrdering and the outputPartitioning.

FilterExec uses the PredicateHelper for…​FIXME

Table 2. FilterExec’s Internal Properties (e.g. Registries, Counters and Flags)
Name Description



Used when…​FIXME



Used when…​FIXME



Used when…​FIXME

Creating FilterExec Instance

FilterExec takes the following when created:

FilterExec initializes the internal registries and counters.

isNullIntolerant Internal Method

isNullIntolerant(expr: Expression): Boolean


isNullIntolerant is used when…​FIXME

usedInputs Method

usedInputs: AttributeSet
usedInputs is part of CodegenSupport Contract to…​FIXME.


output Method

output: Seq[Attribute]
output is part of QueryPlan Contract to…​FIXME.


Generating Java Source Code for Produce Path in Whole-Stage Code Generation — doProduce Method

doProduce(ctx: CodegenContext): String
doProduce is part of CodegenSupport Contract to generate the Java source code for produce path in Whole-Stage Code Generation.


Generating Java Source Code for Consume Path in Whole-Stage Code Generation — doConsume Method

doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String
doConsume is part of CodegenSupport Contract to generate the Java source code for consume path in Whole-Stage Code Generation.

doConsume creates a new metric term for the numOutputRows metric.


In the end, doConsume uses consume and FIXME to generate a Java source code (as a plain text) inside a do {…​} while(false); code block.

// DEMO Write one

genPredicate Internal Method

genPredicate(c: Expression, in: Seq[ExprCode], attrs: Seq[Attribute]): String
genPredicate is an internal method of doConsume.


Executing Physical Operator (Generating RDD[InternalRow]) — doExecute Method

doExecute(): RDD[InternalRow]
doExecute is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. RDD[InternalRow]).

doExecute executes the child physical operator and creates a new MapPartitionsRDD that does the filtering.

// DEMO Show the RDD lineage with the new MapPartitionsRDD after FilterExec

Internally, doExecute takes the numOutputRows metric.

In the end, doExecute requests the child physical operator to execute (that triggers physical query planning and generates an RDD[InternalRow]) and transforms it by executing the following function on internal rows per partition with index (using RDD.mapPartitionsWithIndexInternal that creates another RDD):

  1. Creates a partition filter as a new GenPredicate (for the filter condition expression and the output schema of the child physical operator)

  2. Requests the generated partition filter Predicate to initialize (with 0 partition index)

  3. Filters out elements from the partition iterator (Iterator[InternalRow]) by requesting the generated partition filter Predicate to evaluate for every InternalRow

    1. Increments the numOutputRows metric for positive evaluations (i.e. that returned true)

doExecute (by RDD.mapPartitionsWithIndexInternal) adds a new MapPartitionsRDD to the RDD lineage. Use RDD.toDebugString to see the additional MapPartitionsRDD.

results matching ""

    No results matching ""