WatermarkTracker

WatermarkTracker tracks the event-time watermark of a streaming query (across EventTimeWatermarkExec operators in a physical query plan) based on a given MultipleWatermarkPolicy.

WatermarkTracker is used exclusively in MicroBatchExecution.

WatermarkTracker is created (using the factory method) when MicroBatchExecution is requested to populate start offsets (when requested to run an activated streaming query).

WatermarkTracker takes a single MultipleWatermarkPolicy to be created.

MultipleWatermarkPolicy can be one of the following:

  • MaxWatermark (alias: min)

  • MinWatermark (alias: max)

Tip

Enable ALL logging level for org.apache.spark.sql.execution.streaming.WatermarkTracker to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.sql.execution.streaming.WatermarkTracker=ALL

Refer to Logging.

Creating WatermarkTracker — apply Factory Method

apply(conf: RuntimeConfig): WatermarkTracker

apply uses the spark.sql.streaming.multipleWatermarkPolicy configuration property for the global watermark policy (default: min) and creates a WatermarkTracker.

Note
apply is used exclusively when MicroBatchExecution is requested to populate start offsets (when requested to run an activated streaming query).

setWatermark Method

setWatermark(newWatermarkMs: Long): Unit

setWatermark simply updates the global event-time watermark to the given newWatermarkMs.

Note
setWatermark is used exclusively when MicroBatchExecution is requested to populate start offsets (when requested to run an activated streaming query).

Updating Event-Time Watermark — updateWatermark Method

updateWatermark(executedPlan: SparkPlan): Unit

updateWatermark requests the given physical operator (SparkPlan) to collect all EventTimeWatermarkExec unary physical operators.

updateWatermark simply exits when no EventTimeWatermarkExec was found.

updateWatermark…​FIXME

Note
updateWatermark is used exclusively when MicroBatchExecution is requested to run a single streaming batch (when requested to run an activated streaming query).

Internal Properties

Name Description

globalWatermarkMs

Current global event-time watermark per MultipleWatermarkPolicy (across all EventTimeWatermarkExec operators in a physical query plan)

Default: 0

Used when…​FIXME

operatorToWatermarkMap

Event-time watermark per EventTimeWatermarkExec physical operator (mutable.HashMap[Int, Long])

Used when…​FIXME

results matching ""

    No results matching ""