crossJoin(
right: Dataset[_]): DataFrame
Streaming Operators — High-Level Declarative Streaming Dataset API
Dataset API comes with a set of operators that are of particular use in Spark Structured Streaming that together constitute so-called High-Level Declarative Streaming Dataset API.
Operator | Description |
---|---|
|
|
|
Drops duplicate records (given a subset of columns) |
|
Explains query plans |
|
Aggregates rows by zero, one or more columns |
|
Aggregates rows by a typed grouping function (and gives a KeyValueGroupedDataset) |
|
|
|
|
|
Defines a streaming watermark (on the given |
|
Creates a DataStreamWriter for persisting the result of a streaming query to an external data system |
val rates = spark
.readStream
.format("rate")
.option("rowsPerSecond", 1)
.load
// stream processing
// replace [operator] with the operator of your choice
rates.[operator]
// output stream
import org.apache.spark.sql.streaming.{OutputMode, Trigger}
import scala.concurrent.duration._
val sq = rates
.writeStream
.format("console")
.option("truncate", false)
.trigger(Trigger.ProcessingTime(10.seconds))
.outputMode(OutputMode.Complete)
.queryName("rate-console")
.start
// eventually...
sq.stop