DataSourceV2Strategy Execution Planning Strategy

DataSourceV2Strategy is an execution planning strategy that Spark Planner uses to plan logical operators (from the Data Source API V2).

Table 1. DataSourceV2Strategy’s Execution Planning
Logical Operator Physical Operator

DataSourceV2Relation

DataSourceV2ScanExec

StreamingDataSourceV2Relation

DataSourceV2ScanExec

WriteToDataSourceV2

WriteToDataSourceV2Exec

AppendData with DataSourceV2Relation

WriteToDataSourceV2Exec

WriteToContinuousDataSource

WriteToContinuousDataSourceExec

Repartition with a StreamingDataSourceV2Relation and a ContinuousReader

ContinuousCoalesceExec

Tip

Enable INFO logging level for org.apache.spark.sql.execution.datasources.v2.DataSourceV2Strategy logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.sql.execution.datasources.v2.DataSourceV2Strategy=INFO

Refer to Logging.

Applying DataSourceV2Strategy Strategy to Logical Plan (Executing DataSourceV2Strategy) — apply Method

apply(plan: LogicalPlan): Seq[SparkPlan]
Note
apply is part of GenericStrategy Contract to generate a collection of SparkPlans for a given logical plan.

apply branches off per the given logical operator.

DataSourceV2Relation Logical Operator

For a DataSourceV2Relation logical operator, apply requests the DataSourceV2Relation for the DataSourceReader.

apply then pushFilters followed by pruneColumns.

apply prints out the following INFO message to the logs:

Pushing operators to [ClassName of DataSourceV2]
Pushed Filters: [pushedFilters]
Post-Scan Filters: [postScanFilters]
Output: [output]

apply uses the DataSourceV2Relation to create a DataSourceV2ScanExec physical operator.

If there are any postScanFilters, apply creates a FilterExec physical operator with the DataSourceV2ScanExec physical operator as the child.

In the end, apply creates a ProjectExec physical operator with the FilterExec with the DataSourceV2ScanExec or directly with the DataSourceV2ScanExec physical operator.

StreamingDataSourceV2Relation Logical Operator

For a StreamingDataSourceV2Relation logical operator, apply…​FIXME

WriteToDataSourceV2 Logical Operator

For a WriteToDataSourceV2 logical operator, apply simply creates a WriteToDataSourceV2Exec physical operator.

AppendData Logical Operator

For a AppendData logical operator with a DataSourceV2Relation, apply requests the DataSourceV2Relation to create a DataSourceWriter that is used to create a WriteToDataSourceV2Exec physical operator.

WriteToContinuousDataSource Logical Operator

For a WriteToContinuousDataSource logical operator, apply…​FIXME

Repartition Logical Operator

For a Repartition logical operator, apply…​FIXME

pushFilters Internal Method

pushFilters(
  reader: DataSourceReader,
  filters: Seq[Expression]): (Seq[Expression], Seq[Expression])
Note
pushFilters handles DataSourceReaders with SupportsPushDownFilters support only.

For the given DataSourceReaders with SupportsPushDownFilters support, pushFilters uses the DataSourceStrategy object to translate every filter in the given filters.

pushFilters requests the SupportsPushDownFilters reader to pushFilters first and then for the pushedFilters.

In the end, pushFilters returns a pair of filters pushed and not.

Note
pushFilters is used exclusively when DataSourceV2Strategy execution planning strategy is executed (applied to a DataSourceV2Relation logical operator).

Column Pruning — pruneColumns Internal Method

pruneColumns(
  reader: DataSourceReader,
  relation: DataSourceV2Relation,
  exprs: Seq[Expression]): Seq[AttributeReference]

pruneColumns…​FIXME

Note
pruneColumns is used when…​FIXME

results matching ""

    No results matching ""