StateStoreProvider Contract

StateStoreProvider is the abstraction of state store providers that manage state data in stateful streaming queries.

Note
StateStoreProvider helper object uses spark.sql.streaming.stateStore.providerClass internal configuration property for the name of the class of the StateStoreProvider implementation.
Note
HDFSBackedStateStoreProvider is the only available implementation of the StateStoreProvider Contract in Spark Structured Streaming.
Table 1. StateStoreProvider Contract
Method Description

close

close(): Unit

Closes the state store provider

Used exclusively when StateStore helper object is requested to unload a state store provider

doMaintenance

doMaintenance(): Unit = {}

Does maintenance if needed

Used exclusively when StateStore helper object is requested to perform maintenance of registered state store providers

getStore

getStore(version: Long): StateStore

Returns the StateStore for the specified version

Used exclusively when StateStore helper object is requested to get the StateStore for the given ID and version

init

init(
  stateStoreId: StateStoreId,
  keySchema: StructType,
  valueSchema: StructType,
  keyIndexOrdinal: Option[Int],
  storeConfs: StateStoreConf,
  hadoopConf: Configuration): Unit

Initializes the state store provider

Used exclusively when StateStoreProvider helper object is requested to create and initialize the StateStoreProvider for a given StateStoreId (when StateStore helper object is requested to retrieve a StateStore by ID and version)

stateStoreId

stateStoreId: StateStoreId

StateStoreId associated with the provider (at initialization)

Used when:

supportedCustomMetrics

supportedCustomMetrics: Seq[StateStoreCustomMetric]

StateStoreCustomMetrics of the state store provider

Used when:

Lifecycle of StateStoreProvider

The lifecycle of a StateStoreProvider starts when StateStore helper object (on a Spark executor) is requested for the StateStore by given provider ID and version.

Note
HDFSBackedStateStoreProvider is the only available implementation of the StateStoreProvider Contract in Spark Structured Streaming.
Note
Since StateStore and StateStoreProvider helper objects are Scala objects that gives that there can only be one instance of StateStore and StateStoreProvider on a JVM. That in turn means that there will be only one instance of each per JVM which is exactly the JVM of a Spark executor.

StateStore helper object requests StateStoreProvider helper object to createAndInit that creates the StateStoreProvider implementation (given spark.sql.streaming.stateStore.providerClass internal configuration property) and requests it to initialize.

The initialized StateStoreProvider is cached in loadedProviders internal lookup table (for a StateStoreId) for later lookups.

StateStoreProvider helper object then requests the StateStoreProvider to getStore for the version.

An instance of StateStoreProvider is requested to do its own maintenance or close (when a corresponding StateStore is inactive) in MaintenanceTask daemon thread that runs periodically every spark.sql.streaming.stateStore.maintenanceInterval configuration property (default: 60s).

Creating and Initializing StateStoreProvider — createAndInit Factory Method

createAndInit(
  stateStoreId: StateStoreId,
  keySchema: StructType,
  valueSchema: StructType,
  indexOrdinal: Option[Int],
  storeConf: StateStoreConf,
  hadoopConf: Configuration): StateStoreProvider

createAndInit creates a new StateStoreProvider (per spark.sql.streaming.stateStore.providerClass internal configuration property).

createAndInit requests the StateStoreProvider to initialize.

Note
createAndInit is used exclusively when StateStore helper object is requested for the StateStore by given provider ID and version.

results matching ""

    No results matching ""