Streaming Join

In Spark Structured Streaming, a streaming join is a streaming query that was described (build) using the high-level streaming operators:

Streaming joins can be stateless or stateful:

Stream-Stream Joins

Spark Structured Streaming supports stream-stream joins with the following:

Stream-stream equi-joins are planned as StreamingSymmetricHashJoinExec physical operators of two ShuffleExchangeExec physical operators (per Required Partition Requirements).

Join State Watermark for State Removal

Stream-stream joins may optionally define Join State Watermark for state removal (cf. Watermark Predicates for State Removal).

A join state watermark can be specified on the following:

A join state watermark can be specified on key state, value state or both.

IncrementalExecution — QueryExecution of Streaming Queries

Under the covers, the high-level operators create a logical query plan with one or more Join logical operators.

Tip
Read up on Join Logical Operator in The Internals of Spark SQL online book.

In Spark Structured Streaming IncrementalExecution is responsible for planning streaming queries for execution.

At query planning, IncrementalExecution uses the StreamingJoinStrategy execution planning strategy for planning stream-stream joins as StreamingSymmetricHashJoinExec physical operators.

Demos

Use the following demo application to learn more:

Further Reading Or Watching

results matching ""

    No results matching ""