import spark.sessionState.analyzer.ResolveFunctions
// Example: UnresolvedAttribute with VirtualColumn.hiveGroupingIdName (grouping__id) => Alias
import org.apache.spark.sql.catalyst.expressions.VirtualColumn
import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
val groupingIdAttr = UnresolvedAttribute(VirtualColumn.hiveGroupingIdName)
scala> println(groupingIdAttr.sql)
`grouping__id`
// Using Catalyst DSL to create a logical plan with grouping__id
import org.apache.spark.sql.catalyst.dsl.plans._
val t1 = table("t1")
val plan = t1.select(groupingIdAttr)
scala> println(plan.numberedTreeString)
00 'Project ['grouping__id]
01 +- 'UnresolvedRelation `t1`
val resolvedPlan = ResolveFunctions(plan)
scala> println(resolvedPlan.numberedTreeString)
00 'Project [grouping_id() AS grouping__id#0]
01 +- 'UnresolvedRelation `t1`
import org.apache.spark.sql.catalyst.expressions.Alias
val alias = resolvedPlan.expressions.head.asInstanceOf[Alias]
scala> println(alias.sql)
grouping_id() AS `grouping__id`
// Example: UnresolvedGenerator => a) Generator or b) analysis failure
// Register a function so a function resolution works
import org.apache.spark.sql.catalyst.FunctionIdentifier
import org.apache.spark.sql.catalyst.catalog.CatalogFunction
val f1 = CatalogFunction(FunctionIdentifier(funcName = "f1"), "java.lang.String", resources = Nil)
import org.apache.spark.sql.catalyst.expressions.{Expression, Stack}
// FIXME What happens when looking up a function with the functionBuilder None in registerFunction?
// Using Stack as ResolveFunctions requires that the function to be resolved is a Generator
// You could roll your own, but that's a demo, isn't it? (don't get too carried away)
spark.sessionState.catalog.registerFunction(
funcDefinition = f1,
overrideIfExists = true,
functionBuilder = Some((children: Seq[Expression]) => Stack(children = Nil)))
import org.apache.spark.sql.catalyst.analysis.UnresolvedGenerator
import org.apache.spark.sql.catalyst.FunctionIdentifier
val ungen = UnresolvedGenerator(name = FunctionIdentifier("f1"), children = Seq.empty)
val plan = t1.select(ungen)
scala> println(plan.numberedTreeString)
00 'Project [unresolvedalias('f1(), None)]
01 +- 'UnresolvedRelation `t1`
val resolvedPlan = ResolveFunctions(plan)
scala> println(resolvedPlan.numberedTreeString)
00 'Project [unresolvedalias(stack(), None)]
01 +- 'UnresolvedRelation `t1`
CAUTION: FIXME
// Example: UnresolvedFunction => a) AggregateWindowFunction with(out) isDistinct, b) AggregateFunction, c) other with(out) isDistinct
val plan = ???
val resolvedPlan = ResolveFunctions(plan)
ResolveFunctions Logical Resolution Rule — Resolving grouping__id UnresolvedAttribute, UnresolvedGenerator And UnresolvedFunction Expressions
ResolveFunctions
is a logical resolution rule that the logical query plan analyzer uses to resolve grouping__id UnresolvedAttribute, UnresolvedGenerator and UnresolvedFunction expressions in an entire logical query plan.
Technically, ResolveReferences
is just a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan]
.
ResolveFunctions
is part of Resolution fixed-point batch of rules.
Note
|
ResolveFunctions is a Scala object inside Analyzer class.
|
Resolving grouping__id UnresolvedAttribute, UnresolvedGenerator and UnresolvedFunction Expressions In Entire Query Plan (Applying ResolveFunctions to Logical Plan) — apply
Method
apply(plan: LogicalPlan): LogicalPlan
Note
|
apply is part of Rule Contract to apply a rule to a logical plan.
|
apply
takes a logical plan and transforms each expression (for every logical operator found in the query plan) as follows:
-
For UnresolvedAttributes with names as
groupingid
,apply
creates a Alias (with aGroupingID
child expression andgroupingid
name).That case seems mostly for compatibility with Hive as
grouping__id
attribute name is used by Hive. -
For UnresolvedGenerators,
apply
requests the SessionCatalog to find a Generator function by name.If some other non-generator function is found for the name,
apply
fails the analysis phase by reporting anAnalysisException
:[name] is expected to be a generator. However, its class is [className], which is not a generator.
-
For UnresolvedFunctions,
apply
requests the SessionCatalog to find a function by name. -
AggregateWindowFunctions are returned directly or
apply
fails the analysis phase by reporting anAnalysisException
when theUnresolvedFunction
has isDistinct flag enabled.[name] does not support the modifier DISTINCT
-
AggregateFunctions are wrapped in a AggregateExpression (with
Complete
aggregate mode) -
All other functions are returned directly or
apply
fails the analysis phase by reporting anAnalysisException
when theUnresolvedFunction
has isDistinct flag enabled.[name] does not support the modifier DISTINCT
apply
skips unresolved expressions.