OutputMode

Output mode (OutputMode) of a streaming query describes what data is written to a streaming sink.

There are three available output modes:

Append
Complete
Update

The output mode is specified on the writing side of a streaming query using DataStreamWriter.outputMode method (by alias or a value of org.apache.spark.sql.streaming.OutputMode object).

import org.apache.spark.sql.streaming.OutputMode.Update
val inputStream = spark
  .readStream
  .format("rate")
  .load
  .writeStream
  .format("console")
  .outputMode(Update) // <-- update output mode
  .start

Append Output Mode

Append (alias: append) is the default output mode that writes "new" rows only.

In streaming aggregations, a "new" row is when the intermediate state becomes final, i.e. when new events for the grouping key can only be considered late which is when watermark moves past the event time of the key.

Append output mode requires that a streaming query defines event-time watermark (using withWatermark operator) on the event time column that is used in aggregation (directly or using window function).

Required for datasets with FileFormat format (to create FileStreamSink)

Append is mandatory when multiple flatMapGroupsWithState operators are used in a structured query.

Complete Output Mode

Complete (alias: complete) writes all the rows of a Result Table (and corresponds to a traditional batch structured query).

Complete mode does not drop old aggregation state and preserves all data in the Result Table.

Supported only for streaming aggregations (as asserted by UnsupportedOperationChecker).

Update Output Mode

Update (alias: update) writes only the rows that were updated (every time there are updates).

For queries that are not streaming aggregations, Update is equivalent to the Append output mode.

OutputMode

OutputMode

Append Output Mode

Complete Output Mode

Update Output Mode

results matching ""

No results matching ""