Dataset API — Actions

Actions are part of the Dataset API for…​FIXME

Note
Actions are the methods in the Dataset Scala class that are grouped in action group name, i.e. @group action.
Table 1. Dataset API’s Actions
Action Description

collect

collect(): Array[T]

count

count(): Long

describe

describe(cols: String*): DataFrame

first

first(): T

foreach

foreach(f: T => Unit): Unit

foreachPartition

foreachPartition(f: Iterator[T] => Unit): Unit

head

head(): T
head(n: Int): Array[T]

reduce

reduce(func: (T, T) => T): T

show

show(): Unit
show(truncate: Boolean): Unit
show(numRows: Int): Unit
show(numRows: Int, truncate: Boolean): Unit
show(numRows: Int, truncate: Int): Unit
show(numRows: Int, truncate: Int, vertical: Boolean): Unit

summary

Computes specified statistics for numeric and string columns. The default statistics are: count, mean, stddev, min, max and 25%, 50%, 75% percentiles.

summary(statistics: String*): DataFrame
Note
summary is an extended version of the describe action that simply calculates count, mean, stddev, min and max statistics.

take

take(n: Int): Array[T]

toLocalIterator

toLocalIterator(): java.util.Iterator[T]

collect Action

collect(): Array[T]

collect…​FIXME

count Action

count(): Long

count…​FIXME

Calculating Basic Statistics — describe Action

describe(cols: String*): DataFrame

describe…​FIXME

first Action

first(): T

first…​FIXME

foreach Action

foreach(f: T => Unit): Unit

foreach…​FIXME

foreachPartition Action

foreachPartition(f: Iterator[T] => Unit): Unit

foreachPartition…​FIXME

head Action

head(): T (1)
head(n: Int): Array[T]
  1. Calls the other head with n as 1 and takes the first element

head…​FIXME

reduce Action

reduce(func: (T, T) => T): T

reduce…​FIXME

show Action

show(): Unit
show(truncate: Boolean): Unit
show(numRows: Int): Unit
show(numRows: Int, truncate: Boolean): Unit
show(numRows: Int, truncate: Int): Unit
show(numRows: Int, truncate: Int, vertical: Boolean): Unit

show…​FIXME

Calculating Statistics — summary Action

summary(statistics: String*): DataFrame

summary calculates specified statistics for numeric and string columns.

The default statistics are: count, mean, stddev, min, max and 25%, 50%, 75% percentiles.

Note
summary accepts arbitrary approximate percentiles specified as a percentage (e.g. 10%).

Internally, summary uses the StatFunctions to calculate the requested summaries for the Dataset.

Taking First Records — take Action

take(n: Int): Array[T]

take is an action on a Dataset that returns a collection of n records.

Warning
take loads all the data into the memory of the Spark application’s driver process and for a large n could result in OutOfMemoryError.

Internally, take creates a new Dataset with Limit logical plan for Literal expression and the current LogicalPlan. It then runs the SparkPlan that produces a Array[InternalRow] that is in turn decoded to Array[T] using a bounded encoder.

toLocalIterator Action

toLocalIterator(): java.util.Iterator[T]

toLocalIterator…​FIXME

results matching ""

    No results matching ""