FilterExec Unary Physical Operator

FilterExec is a unary physical operator (i.e. with one child physical operator) that represents Filter and TypedFilter unary logical operators at execution.

FilterExec supports Java code generation (aka codegen) as follows:

FilterExec is created when:

Table 1. FilterExec’s Performance Metrics
Key Name (in web UI) Description

numOutputRows

number of output rows

spark sql FilterExec webui details for query.png
Figure 1. FilterExec in web UI (Details for Query)

FilterExec uses whatever the child physical operator uses for the input RDDs, the outputOrdering and the outputPartitioning.

FilterExec uses the PredicateHelper for…​FIXME

Table 2. FilterExec’s Internal Properties (e.g. Registries, Counters and Flags)
Name Description

notNullAttributes

FIXME

Used when…​FIXME

notNullPreds

FIXME

Used when…​FIXME

otherPreds

FIXME

Used when…​FIXME

Creating FilterExec Instance

FilterExec takes the following when created:

FilterExec initializes the internal registries and counters.

isNullIntolerant Internal Method

isNullIntolerant(expr: Expression): Boolean

isNullIntolerant…​FIXME

Note
isNullIntolerant is used when…​FIXME

usedInputs Method

usedInputs: AttributeSet
Note
usedInputs is part of CodegenSupport Contract to…​FIXME.

usedInputs…​FIXME

output Method

output: Seq[Attribute]
Note
output is part of QueryPlan Contract to…​FIXME.

output…​FIXME

Generating Java Source Code for Produce Path in Whole-Stage Code Generation — doProduce Method

doProduce(ctx: CodegenContext): String
Note
doProduce is part of CodegenSupport Contract to generate the Java source code for produce path in Whole-Stage Code Generation.

doProduce…​FIXME

Generating Java Source Code for Consume Path in Whole-Stage Code Generation — doConsume Method

doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String
Note
doConsume is part of CodegenSupport Contract to generate the Java source code for consume path in Whole-Stage Code Generation.

doConsume creates a new metric term for the numOutputRows metric.

doConsume…​FIXME

In the end, doConsume uses consume and FIXME to generate a Java source code (as a plain text) inside a do {…​} while(false); code block.

// DEMO Write one

genPredicate Internal Method

genPredicate(c: Expression, in: Seq[ExprCode], attrs: Seq[Attribute]): String
Note
genPredicate is an internal method of doConsume.

genPredicate…​FIXME

Executing Physical Operator (Generating RDD[InternalRow]) — doExecute Method

doExecute(): RDD[InternalRow]
Note
doExecute is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. RDD[InternalRow]).

doExecute executes the child physical operator and creates a new MapPartitionsRDD that does the filtering.

// DEMO Show the RDD lineage with the new MapPartitionsRDD after FilterExec

Internally, doExecute takes the numOutputRows metric.

In the end, doExecute requests the child physical operator to execute (that triggers physical query planning and generates an RDD[InternalRow]) and transforms it by executing the following function on internal rows per partition with index (using RDD.mapPartitionsWithIndexInternal that creates another RDD):

  1. Creates a partition filter as a new GenPredicate (for the filter condition expression and the output schema of the child physical operator)

  2. Requests the generated partition filter Predicate to initialize (with 0 partition index)

  3. Filters out elements from the partition iterator (Iterator[InternalRow]) by requesting the generated partition filter Predicate to evaluate for every InternalRow

    1. Increments the numOutputRows metric for positive evaluations (i.e. that returned true)

Note
doExecute (by RDD.mapPartitionsWithIndexInternal) adds a new MapPartitionsRDD to the RDD lineage. Use RDD.toDebugString to see the additional MapPartitionsRDD.

results matching ""

    No results matching ""