UnsupportedOperationChecker

UnsupportedOperationChecker checks whether the logical plan of a streaming query uses supported operations only.

Note
UnsupportedOperationChecker is used exclusively when the internal spark.sql.streaming.unsupportedOperationCheck Spark property is enabled (which is by default).
Note

UnsupportedOperationChecker comes actually with two methods, i.e. checkForBatch and checkForStreaming, whose names reveal the different flavours of Spark SQL (as of 2.0), i.e. batch and streaming, respectively.

The Spark Structured Streaming gitbook is solely focused on checkForStreaming method.

checkForStreaming Method

checkForStreaming(
  plan: LogicalPlan,
  outputMode: OutputMode): Unit

checkForStreaming asserts that the following requirements hold:

checkForStreaming…​FIXME

checkForStreaming finds all streaming aggregates (i.e. Aggregate logical operators with streaming sources).

Note
Aggregate logical operator represents Dataset.groupBy and Dataset.groupByKey operators (and SQL’s GROUP BY clause) in a logical query plan.

checkForStreaming asserts that there is exactly one streaming aggregation in a streaming query.

Otherwise, checkForStreaming reports a AnalysisException:

Multiple streaming aggregations are not supported with streaming DataFrames/Datasets

checkForStreaming asserts that watermark was defined for a streaming aggregation with Append output mode (on at least one of the grouping expressions).

Otherwise, checkForStreaming reports a AnalysisException:

Append output mode not supported when there are streaming aggregations on streaming DataFrames/DataSets without watermark
Caution
FIXME

checkForStreaming counts all FlatMapGroupsWithState logical operators (on streaming Datasets with isMapGroupsWithState flag disabled).

Note
FlatMapGroupsWithState logical operator represents KeyValueGroupedDataset.mapGroupsWithState and KeyValueGroupedDataset.flatMapGroupsWithState operators in a logical query plan.
Note
FlatMapGroupsWithState.isMapGroupsWithState flag is disabled when…​FIXME

checkForStreaming asserts that multiple FlatMapGroupsWithState logical operators are only used when:

  • outputMode is Append output mode

  • outputMode of the FlatMapGroupsWithState logical operators is also Append output mode

Caution
FIXME Reference to an example in flatMapGroupsWithState

Otherwise, checkForStreaming reports a AnalysisException:

Multiple flatMapGroupsWithStates are not supported when they are not all in append mode or the output mode is not append on a streaming DataFrames/Datasets
Caution
FIXME
Note
checkForStreaming is used exclusively when StreamingQueryManager is requested to create a StreamingQueryWrapper (for starting a streaming query), but only when the internal spark.sql.streaming.unsupportedOperationCheck Spark property is enabled (which is by default).

results matching ""

    No results matching ""