ScalarSubquery (ExecSubqueryExpression) Expression

ScalarSubquery is an ExecSubqueryExpression that can give exactly one value (i.e. the value of executing SubqueryExec subquery that can result in a single row and a single column or null if no row were computed).

Spark SQL uses the name of ScalarSubquery twice to represent an ExecSubqueryExpression (this page) and a SubqueryExpression. It is confusing and you should not be anymore.

ScalarSubquery is created exclusively when PlanSubqueries physical optimization is executed (and plans a ScalarSubquery expression).

import org.apache.spark.sql.execution.PlanSubqueries
val spark = ...
val planSubqueries = PlanSubqueries(spark)
val plan = ...
val executedPlan = planSubqueries(plan)

ScalarSubquery expression cannot be evaluated, i.e. produce a value given an internal row.

ScalarSubquery uses…​FIXME…​for the data type.

Table 1. ScalarSubquery’s Internal Properties (e.g. Registries, Counters and Flags)
Name Description


The value of the single column in a single row after collecting the rows from executing the subquery plan or null if no rows were collected.


Flag that says whether ScalarSubquery was updated with collected result of executing the subquery plan.

Creating ScalarSubquery Instance

ScalarSubquery takes the following when created:

Updating ScalarSubquery With Collected Result — updateResult Method

updateResult(): Unit
updateResult is part of ExecSubqueryExpression Contract to fill an Catalyst expression with a collected result from executing a subquery plan.

updateResult requests SubqueryExec physical plan to execute and collect internal rows.

updateResult sets result to the value of the only column of the single row or null if no row were collected.

In the end, updateResult marks the ScalarSubquery instance as updated.

updateResult reports a RuntimeException when there are more than 1 rows in the result.

more than one row returned by a subquery used as an expression:

updateResult reports an AssertionError when the number of fields is not exactly 1.

Expects 1 field, but got [numFields] something went wrong in analysis

Evaluating Expression — eval Method

eval(input: InternalRow): Any
eval is part of Expression Contract for the interpreted (non-code-generated) expression evaluation, i.e. evaluating a Catalyst expression to a JVM object for a given internal binary row.

eval simply returns result value.

eval reports an IllegalArgumentException if the ScalarSubquery expression has not been updated yet.

Generating Java Source Code (ExprCode) For Code-Generated Expression Evaluation — doGenCode Method

doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode
doGenCode is part of Expression Contract to generate a Java source code (ExprCode) for code-generated expression evaluation.

doGenCode first makes sure that the updated flag is on (true). If not, doGenCode throws an IllegalArgumentException exception with the following message:

requirement failed: [this] has not finished

doGenCode then creates a Literal (for the result and the dataType) and simply requests it to generate a Java source code.

results matching ""

    No results matching ""