StateStoreRestoreExec Unary Physical Operator — Restoring Streaming State From State Store

StateStoreRestoreExec is a unary physical operator that restores (reads) a streaming state from a state store (for the keys from the child physical operator).

Note	A unary physical operator (`UnaryExecNode`) is a physical operator with a single child physical operator. Read up on UnaryExecNode (and physical operators in general) in The Internals of Spark SQL book.

StateStoreRestoreExec is created exclusively when StatefulAggregationStrategy execution planning strategy is requested to plan a streaming aggregation for execution (Aggregate logical operators in the logical plan of a streaming query).

StateStoreRestoreExec StatefulAggregationStrategy.png

Figure 1. StateStoreRestoreExec and StatefulAggregationStrategy

The optional StatefulOperatorStateInfo is initially undefined (i.e. when StateStoreRestoreExec is created). StateStoreRestoreExec is updated to hold the streaming batch-specific execution property when IncrementalExecution prepares a streaming physical plan for execution (and state preparation rule is executed when StreamExecution plans a streaming query for a streaming batch).

StateStoreRestoreExec IncrementalExecution.png

Figure 2. StateStoreRestoreExec and IncrementalExecution

When executed, StateStoreRestoreExec executes the child physical operator and creates a StateStoreRDD to map over partitions with storeUpdateFunction that restores the state for the keys in the input rows if available.

The output schema of StateStoreRestoreExec is exactly the child's output schema.

The output partitioning of StateStoreRestoreExec is exactly the child's output partitioning.

Performance Metrics (SQLMetrics)

Key Name (in UI) Description

Key	Name (in UI)	Description
`numOutputRows`	number of output rows	The number of input rows from the child physical operator (for which `StateStoreRestoreExec` tried to find the state)

numOutputRows

number of output rows

The number of input rows from the child physical operator (for which StateStoreRestoreExec tried to find the state)

StateStoreRestoreExec webui query details.png

Figure 3. StateStoreRestoreExec in web UI (Details for Query)

Creating StateStoreRestoreExec Instance

StateStoreRestoreExec takes the following to be created:

Key expressions, i.e. Catalyst attributes for the grouping keys
Optional StatefulOperatorStateInfo (default: None)
Version of the state format (based on the spark.sql.streaming.aggregation.stateFormatVersion configuration property)
Child physical operator (SparkPlan)