StateManager Contract — State Managers for Arbitrary Stateful Streaming Aggregation

StateManager is the abstraction of state managers that act as middlemen between state stores and the FlatMapGroupsWithStateExec physical operator used in Arbitrary Stateful Streaming Aggregation.

Table 1. StateManager Contract
Method Description

getAllState

getAllState(store: StateStore): Iterator[StateData]

Retrieves all state data (for all keys) from the StateStore

Used exclusively when InputProcessor is requested to processTimedOutState

getState

getState(
  store: StateStore,
  keyRow: UnsafeRow): StateData

Gets the state data for the key from the StateStore

Used exclusively when InputProcessor is requested to processNewData

putState

putState(
  store: StateStore,
  keyRow: UnsafeRow,
  state: Any,
  timeoutTimestamp: Long): Unit

Persists (puts) the state value for the key in the StateStore

Used exclusively when InputProcessor is requested to callFunctionAndUpdateState (right after all rows have been processed)

removeState

removeState(
  store: StateStore,
  keyRow: UnsafeRow): Unit

Removes the state for the key from the StateStore

Used exclusively when InputProcessor is requested to callFunctionAndUpdateState (right after all rows have been processed)

stateSchema

stateSchema: StructType

State schema

Note
It looks like (in StateManager of the FlatMapGroupsWithStateExec physical operator) stateSchema is used for the schema of state value objects (not state keys as they are described by the grouping attributes instead).

Used when:

Note
StateManagerImplBase is the one and only known direct implementation of the StateManager Contract in Spark Structured Streaming.
Note
StateManager is a Scala sealed trait which means that all the implementations are in the same compilation unit (a single file).

results matching ""

    No results matching ""