val q = spark.range(5).select("*")
val plan = q.queryExecution.logical
scala> println(plan.numberedTreeString)
00 'Project [*]
01 +- AnalysisBarrier
02 +- Range (0, 5, step=1, splits=Some(8))
import org.apache.spark.sql.catalyst.analysis.UnresolvedStar
val starExpr = plan.expressions.head.asInstanceOf[UnresolvedStar]
val namedExprs = starExpr.expand(input = q.queryExecution.analyzed, spark.sessionState.analyzer.resolver)
scala> println(namedExprs.head.numberedTreeString)
00 id#0: bigint
UnresolvedStar Expression
UnresolvedStar is a Star expression that represents a star (i.e. all) expression in a logical query plan.
UnresolvedStar is created when:
UnresolvedStar can never be resolved, and is expanded at analysis (when ResolveReferences logical resolution rule is executed).
|
Note
|
UnresolvedStar can only be used in Project, Aggregate or ScriptTransformation logical operators.
|
Given UnresolvedStar can never be resolved it should not come as a surprise that it cannot be evaluated either (i.e. produce a value given an internal row). When requested to evaluate, UnresolvedStar simply reports a UnsupportedOperationException.
Cannot evaluate expression: [this]
When created, UnresolvedStar takes name parts that, once concatenated, is the target of the star expansion.
import org.apache.spark.sql.catalyst.analysis.UnresolvedStar
scala> val us = UnresolvedStar(None)
us: org.apache.spark.sql.catalyst.analysis.UnresolvedStar = *
scala> val ab = UnresolvedStar(Some("a" :: "b" :: Nil))
ab: org.apache.spark.sql.catalyst.analysis.UnresolvedStar = List(a, b).*
|
Tip
|
Use
You could also use |
|
Note
|
|
Star Expansion — expand Method
expand(input: LogicalPlan, resolver: Resolver): Seq[NamedExpression]
|
Note
|
expand is part of Star Contract to…FIXME.
|
expand first expands to named expressions per target:
-
For unspecified target,
expandgives the output schema of theinputlogical query plan (that assumes that the star refers to a relation / table) -
For target with one element,
expandgives the table (attribute) in the output schema of theinputlogical query plan (using qualifiers) if available
With no result earlier, expand then requests the input logical query plan to resolve the target name parts to a named expression.
For a named expression of StructType data type, expand creates an Alias expression with a GetStructField unary expression (with the resolved named expression and the field index).
val q = Seq((0, "zero")).toDF("id", "name").select(struct("id", "name") as "s")
val analyzedPlan = q.queryExecution.analyzed
import org.apache.spark.sql.catalyst.analysis.UnresolvedStar
import org.apache.spark.sql.catalyst.dsl.expressions._
val s = star("s").asInstanceOf[UnresolvedStar]
val exprs = s.expand(input = analyzedPlan, spark.sessionState.analyzer.resolver)
// star("s") should expand to two Alias(GetStructField) expressions
// s is a struct of id and name in the query
import org.apache.spark.sql.catalyst.expressions.{Alias, GetStructField}
val getStructFields = exprs.collect { case Alias(g: GetStructField, _) => g }.map(_.sql)
scala> getStructFields.foreach(println)
`s`.`id`
`s`.`name`
expand reports a AnalysisException when:
-
The data type of the named expression (when the
inputlogical plan was requested to resolve the target) is not a StructType.Can only star expand struct data types. Attribute: `[target]` -
Earlier attempts gave no results
cannot resolve '[target].*' given input columns '[from]'