CumeDist Declarative Window Aggregate Function Expression

CumeDist is a SizeBasedWindowFunction and a RowNumberLike expression that is used for the following:

CumeDist takes no input parameters when created.

import org.apache.spark.sql.catalyst.expressions.CumeDist
val cume_dist = CumeDist()

CumeDist uses cume_dist for the user-facing name.

import org.apache.spark.sql.catalyst.expressions.CumeDist
val cume_dist = CumeDist()
scala> println(cume_dist)
cume_dist()

As an WindowFunction expression (indirectly), CumeDist requires the SpecifiedWindowFrame (with the RangeFrame frame type, the UnboundedPreceding lower and the CurrentRow upper frame boundaries) as the frame.

Note
The frame for CumeDist expression is range-based instead of row-based, because it has to return the same value for tie values in a window (equal values per ORDER BY specification).

As a DeclarativeAggregate expression (indirectly), CumeDist defines the evaluateExpression expression which returns the final value when CumeDist is evaluated. The value uses the formula rowNumber / n where rowNumber is the row number in a window frame (the number of values before and including the current row) divided by the number of rows in the window frame.

import org.apache.spark.sql.catalyst.expressions.CumeDist
val cume_dist = CumeDist()
scala> println(cume_dist.evaluateExpression.numberedTreeString)
00 (cast(rowNumber#0 as double) / cast(window__partition__size#1 as double))
01 :- cast(rowNumber#0 as double)
02 :  +- rowNumber#0: int
03 +- cast(window__partition__size#1 as double)
04    +- window__partition__size#1: int

results matching ""

    No results matching ""