In Spark Structured Streaming, a streaming join is a streaming query that was described (build) using the high-level streaming operators:
Streaming joins can be stateless or stateful:
Spark Structured Streaming supports stream-stream joins with the following:
Stream-stream joins may optionally define Join State Watermark for state removal (cf. Watermark Predicates for State Removal).
A join state watermark can be specified on the following:
A join state watermark can be specified on key state, value state or both.
Under the covers, the high-level operators create a logical query plan with one or more
Join logical operators.
In Spark Structured Streaming IncrementalExecution is responsible for planning streaming queries for execution.
Use the following demo application to learn more:
Stream-stream Joins in the official documentation of Apache Spark for Structured Streaming
Introducing Stream-Stream Joins in Apache Spark 2.3 by Databricks
(video) Deep Dive into Stateful Stream Processing in Structured Streaming by Tathagata Das