val q = sql("""
SELECT customer, year, SUM(sales)
FROM VALUES ("abc", 2017, 30) AS t1 (customer, year, sales)
GROUP BY customer, year
GROUPING SETS ((customer), (year))
scala> println(q.queryExecution.logical.numberedTreeString)
00 'GroupingSets [ArrayBuffer('customer), ArrayBuffer('year)], ['customer, 'year], ['customer, 'year, unresolvedalias('SUM('sales), None)]
01 +- 'SubqueryAlias t1
02 +- 'UnresolvedInlineTable [customer, year, sales], [List(abc, 2017, 30)]
GroupingSets Unary Logical Operator
is a unary logical operator that represents SQL’s GROUPING SETS variant of GROUP BY
operator is resolved to an Aggregate
logical operator at analysis phase.
scala> println(q.queryExecution.analyzed.numberedTreeString)
00 Aggregate [customer#8, year#9, spark_grouping_id#5], [customer#8, year#9, sum(cast(sales#2 as bigint)) AS sum(sales)#4L]
01 +- Expand [List(customer#0, year#1, sales#2, customer#6, null, 1), List(customer#0, year#1, sales#2, null, year#7, 2)], [customer#0, year#1, sales#2, customer#8, year#9, spark_grouping_id#5]
02 +- Project [customer#0, year#1, sales#2, customer#0 AS customer#6, year#1 AS year#7]
03 +- SubqueryAlias t1
04 +- LocalRelation [customer#0, year#1, sales#2]
GroupingSets can only be created using SQL.
GroupingSets is not supported on Structured Streaming’s streaming Datasets.
is never resolved (as it can only be converted to an Aggregate
logical operator).
The output schema of a GroupingSets
are exactly the attributes of aggregate named expressions.
Analysis Phase
operator is resolved at analysis phase in the following logical evaluation rules:
ResolveAliases for unresolved aliases in aggregate named expressions
val spark: SparkSession = ...
// using q from the example above
val plan = q.queryExecution.logical
scala> println(plan.numberedTreeString)
00 'GroupingSets [ArrayBuffer('customer), ArrayBuffer('year)], ['customer, 'year], ['customer, 'year, unresolvedalias('SUM('sales), None)]
01 +- 'SubqueryAlias t1
02 +- 'UnresolvedInlineTable [customer, year, sales], [List(abc, 2017, 30)]
// Note unresolvedalias for SUM expression
// Note UnresolvedInlineTable and SubqueryAlias
// FIXME Show the evaluation rules to get rid of the unresolvable parts
Creating GroupingSets Instance
takes the following when created:
Expressions from
clause -
Grouping expressions from
clause -
Child logical plan
Aggregate named expressions