val people: Dataset[Person] = ... import org.apache.spark.sql.streaming.ProcessingTime import scala.concurrent.duration._ import org.apache.spark.sql.streaming.OutputMode.Complete df.writeStream .queryName("textStream") .outputMode(Complete) .trigger(ProcessingTime(10.seconds)) .format("console") .start
DataFrameWriter is a part of Structured Streaming API as of Spark 2.0 that is responsible for writing the output of streaming queries to sinks and hence starting their execution.
outputMode(outputMode: OutputMode): DataStreamWriter[T]
Currently, the following output modes are supported:
OutputMode.Append— only the new rows in the streaming dataset will be written to a sink.
OutputMode.Complete— entire streaming dataset (with all the rows) will be written to a sink every time there are updates. It is supported only for streaming queries with aggregations.
queryName(queryName: String): DataStreamWriter[T]
queryName sets the name of a streaming query.
Internally, it is just an additional option with the key
trigger(trigger: Trigger): DataStreamWriter[T]
trigger method sets the time interval of the trigger (batch) for a streaming query.
The default trigger is ProcessingTime(0L) that runs a streaming query as often as possible.
Consult Trigger to learn about
start(): StreamingQuery start(path: String): StreamingQuery (1)
start methods start a streaming query and return a StreamingQuery object to continually write data.
Whether or not you have to specify
queryNameis the name of active streaming query.
checkpointLocationis the directory for checkpointing.