Powered by GitBook

KeyValueGroupedDataset — Typed Grouping

KeyValueGroupedDataset is an experimental interface to calculate aggregates over groups of objects in a typed Dataset.

Note	RelationalGroupedDataset is used for untyped `Row`-based aggregates.

KeyValueGroupedDataset is created using Dataset.groupByKey operator.

val dataset: Dataset[Token] = ...
scala> val tokensByName = dataset.groupByKey(_.name)
tokensByName: org.apache.spark.sql.KeyValueGroupedDataset[String,Token] = org.apache.spark.sql.KeyValueGroupedDataset@1e3aad46

Table 1. KeyValueGroupedDataset’s Aggregate Operators (KeyValueGroupedDataset API)
Operator	Description
`agg`
`cogroup`
`count`
`flatMapGroups`
`flatMapGroupsWithState`
`keys`
`keyAs`
`mapGroups`
`mapGroupsWithState`
`mapValues`
`reduceGroups`

KeyValueGroupedDataset holds keys that were used for the object.

scala> tokensByName.keys.show
+-----+
|value|
+-----+
|  aaa|
|  bbb|
+-----+

`aggUntyped` Internal Method

aggUntyped(columns: TypedColumn[_, _]*): Dataset[_]

aggUntyped…FIXME

Note	`aggUntyped` is used exclusively when KeyValueGroupedDataset.agg typed operator is used.

`logicalPlan` Internal Method

logicalPlan: AnalysisBarrier

logicalPlan…FIXME

Note	`logicalPlan` is used when…FIXME

results matching ""

No results matching ""