ScalaUDAF — Catalyst Expression Adapter for UserDefinedAggregateFunction

ScalaUDAF is a Catalyst expression adapter to manage the lifecycle of UserDefinedAggregateFunction and hook it in Spark SQL’s Catalyst execution path.

ScalaUDAF is created when:

  1. UserDefinedAggregateFunction creates a Column for a user-defined aggregate function using all and distinct values (to use the UDAF in Dataset operators)

  2. UDFRegistration is requested to register a user-defined aggregate function (to use the UDAF in SQL mode)

ScalaUDAF is a ImperativeAggregate.

Table 1. ScalaUDAF’s ImperativeAggregate Methods
Method Name Behaviour

initialize

Requests UserDefinedAggregateFunction to initialize

merge

Requests UserDefinedAggregateFunction to merge

update

Requests UserDefinedAggregateFunction to update

When evaluated, ScalaUDAF…​FIXME

ScalaUDAF has no representation in SQL.

Table 2. ScalaUDAF’s Properties
Name Description

aggBufferAttributes

AttributeReferences of aggBufferSchema

aggBufferSchema

bufferSchema of UserDefinedAggregateFunction

dataType

DataType of UserDefinedAggregateFunction

deterministic

deterministic of UserDefinedAggregateFunction

inputAggBufferAttributes

Copy of aggBufferAttributes

inputTypes

Data types from inputSchema of UserDefinedAggregateFunction

nullable

Always enabled (i.e. true)

Table 3. ScalaUDAF’s Internal Registries and Counters
Name Description

inputAggregateBuffer

Used when…​FIXME

inputProjection

Used when…​FIXME

inputToScalaConverters

Used when…​FIXME

mutableAggregateBuffer

Used when…​FIXME

Creating ScalaUDAF Instance

ScalaUDAF takes the following when created:

ScalaUDAF initializes the internal registries and counters.

initialize Method

initialize(buffer: InternalRow): Unit

initialize sets the input buffer internal binary row as underlyingBuffer of MutableAggregationBufferImpl and requests the UserDefinedAggregateFunction to initialize (with the MutableAggregationBufferImpl).

spark sql ScalaUDAF initialize.png
Figure 1. ScalaUDAF initializes UserDefinedAggregateFunction
Note
initialize is a part of ImperativeAggregate Contract.

update Method

update(mutableAggBuffer: InternalRow, inputRow: InternalRow): Unit

update sets the input buffer internal binary row as underlyingBuffer of MutableAggregationBufferImpl and requests the UserDefinedAggregateFunction to update.

Note
update uses inputProjection on the input input and converts it using inputToScalaConverters.
spark sql ScalaUDAF update.png
Figure 2. ScalaUDAF updates UserDefinedAggregateFunction
Note
update is a part of ImperativeAggregate Contract.

merge Method

merge(buffer1: InternalRow, buffer2: InternalRow): Unit

merge first sets:

spark sql ScalaUDAF merge.png
Figure 3. ScalaUDAF requests UserDefinedAggregateFunction to merge
Note
merge is a part of ImperativeAggregate Contract.

results matching ""

    No results matching ""