import org.apache.spark.sql.functions.collect_list
scala> val fn = collect_list("gid")
fn: org.apache.spark.sql.Column = collect_list(gid)
import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression
scala> val aggFn = fn.expr.asInstanceOf[AggregateExpression].aggregateFunction
aggFn: org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction = collect_list('gid, 0, 0)
scala> println(aggFn.numberedTreeString)
00 collect_list('gid, 0, 0)
01 +- 'gid
AggregateFunction Contract — Aggregate Function Expressions
AggregateFunction
is the contract for Catalyst expressions that represent aggregate functions.
AggregateFunction
is used wrapped inside a AggregateExpression (using toAggregateExpression method) when:
-
Analyzer
resolves functions (for SQL mode) -
…FIXME: Anywhere else?
Note
|
Aggregate functions are not foldable, i.e. FIXME |
Name | Behaviour | Examples |
---|---|---|
AggregateFunction Contract
abstract class AggregateFunction extends Expression {
def aggBufferSchema: StructType
def aggBufferAttributes: Seq[AttributeReference]
def inputAggBufferAttributes: Seq[AttributeReference]
def defaultResult: Option[Literal] = None
}
Method | Description |
---|---|
Schema of an aggregation buffer to hold partial aggregate results. Used mostly in ScalaUDAF and AggregationIterator |
|
AttributeReferences of an aggregation buffer to hold partial aggregate results. Used in:
|
|
Defaults to |
Creating AggregateExpression for AggregateFunction — toAggregateExpression
Method
toAggregateExpression(): AggregateExpression (1)
toAggregateExpression(isDistinct: Boolean): AggregateExpression
-
Calls the other
toAggregateExpression
withisDistinct
disabled (i.e.false
)
toAggregateExpression
creates a AggregateExpression for the current AggregateFunction
with Complete
aggregate mode.
Note
|
|