DataSourceStrategy Execution Planning Strategy

DataSourceStrategy is an execution planning strategy (of SparkPlanner) that converts LogicalRelation logical operator to RowDataSourceScanExec physical operator.

Table 1. DataSourceStrategy’s Selection Requirements (in execution order)
Logical Operator Selection Requirements

LogicalRelation with CatalystScan relation

Uses pruneFilterProjectRaw

CatalystScan does not seem to be used in Spark SQL.

LogicalRelation with PrunedFilteredScan relation

Uses pruneFilterProjectRaw

Matches JDBCRelation exclusively (as it is PrunedFilteredScan)

LogicalRelation with PrunedScan relation

Uses pruneFilterProjectRaw

PrunedScan does not seem to be used in Spark SQL.

LogicalRelation with TableScan relation

Matches KafkaRelation exclusively (as it is TableScan)

DataSourceStrategy uses PhysicalOperation to destructure a logical plan.

Creating RowDataSourceScanExec (under FilterExec and ProjectExec) — pruneFilterProjectRaw Internal Method

  relation: LogicalRelation,
  projects: Seq[NamedExpression],
  filterPredicates: Seq[Expression],
  scanBuilder: (Seq[Attribute], Seq[Expression], Seq[Filter]) => RDD[InternalRow]): SparkPlan

pruneFilterProjectRaw creates a RowDataSourceScanExec (possibly as a child of FilterExec that in turn could be a child of ProjectExec).

pruneFilterProjectRaw is used when DataSourceStrategy executes (and selects RowDataSourceScanExec per LogicalRelation).

results matching ""

    No results matching ""