GroupState — Group State in Arbitrary Stateful Streaming Aggregation

GroupState is used with the following KeyValueGroupedDataset operations:

GroupState is created separately for every aggregation key to hold a state as an aggregation state value.

Table 1. GroupState Contract
Method Description

exists

exists: Boolean

Checks whether the state value exists or not

If not exists, get throws a NoSuchElementException. Use getOption instead.

get

get: S

Gets the state value if it exists or throws a NoSuchElementException

getCurrentProcessingTimeMs

getCurrentProcessingTimeMs(): Long

Gets the current processing time (as milliseconds in epoch time)

getCurrentWatermarkMs

getCurrentWatermarkMs(): Long

Gets the current event time watermark (as milliseconds in epoch time)

getOption

getOption: Option[S]

Gets the state value as a Scala Option (regardless whether it exists or not)

Used when:

  • InputProcessor is requested to callFunctionAndUpdateState (when the row iterator is consumed and a state value has been updated, removed or timeout changed)

  • GroupStateImpl is requested for the textual representation

hasTimedOut

hasTimedOut: Boolean

Whether the state (for a given key) has timed out or not.

Can only be true when timeouts are enabled using setTimeoutDuration

remove

remove(): Unit

Removes the state

setTimeoutDuration

setTimeoutDuration(durationMs: Long): Unit
setTimeoutDuration(duration: String): Unit

Specifies the timeout duration for the state key (in millis or as a string, e.g. "10 seconds", "1 hour") for GroupStateTimeout.ProcessingTimeTimeout

setTimeoutTimestamp

setTimeoutTimestamp(timestamp: java.sql.Date): Unit
setTimeoutTimestamp(
  timestamp: java.sql.Date,
  additionalDuration: String): Unit
setTimeoutTimestamp(timestampMs: Long): Unit
setTimeoutTimestamp(
  timestampMs: Long,
  additionalDuration: String): Unit

Specifies the timeout timestamp for the state key for GroupStateTimeout.EventTimeTimeout

update

update(newState: S): Unit

Updates the state (sets the state to a new value)

Note
GroupStateImpl is the default and only known implementation of the GroupState Contract in Spark Structured Streaming.

results matching ""

    No results matching ""