DataSourceRDD — Input RDD Of DataSourceV2ScanExec Physical Operator

DataSourceRDD acts as a thin adapter between Spark SQL’s Data Source API V2 and Spark Core’s RDD API.

DataSourceRDD is an RDD that is created exclusively when DataSourceV2ScanExec physical operator is requested for the input RDD (when WholeStageCodegenExec physical operator is executed).

DataSourceRDD uses DataSourceRDDPartition when requested for the partitions (that is a mere wrapper of the InputPartitions).

DataSourceRDD takes the following to be created:

Spark Core’s SparkContext
InputPartitions (Seq[InputPartition[T]])

DataSourceRDD as a Spark Core RDD overrides the following methods:

getPartitions
compute
getPreferredLocations

Requesting Preferred Locations (For Partition) — `getPreferredLocations` Method

getPreferredLocations(split: Partition): Seq[String]

Note	`getPreferredLocations` is part of Spark Core’s `RDD` Contract to…FIXME.

getPreferredLocations simply requests the given split DataSourceRDDPartition for the InputPartition that in turn is requested for the preferred locations.

RDD Partitions — `getPartitions` Method

getPartitions: Array[Partition]

Note	`getPartitions` is part of Spark Core’s `RDD` Contract to…FIXME

getPartitions simply creates a DataSourceRDDPartition for every InputPartition.

Computing Partition (in TaskContext) — `compute` Method

compute(split: Partition, context: TaskContext): Iterator[T]

Note	`compute` is part of Spark Core’s `RDD` Contract to compute a partition (in a `TaskContext`).

compute requests the input DataSourceRDDPartition (the split partition) for the InputPartition that in turn is requested to create an InputPartitionReader.

compute registers a Spark Core TaskCompletionListener that requests the InputPartitionReader to close when a task completes.

compute returns a Spark Core InterruptibleIterator that requests the InputPartitionReader to proceed to the next record (when requested to hasNext) and return the current record (when next).

DataSourceRDD

DataSourceRDD — Input RDD Of DataSourceV2ScanExec Physical Operator

Requesting Preferred Locations (For Partition) — `getPreferredLocations` Method

RDD Partitions — `getPartitions` Method

Computing Partition (in TaskContext) — `compute` Method

results matching ""

No results matching ""

DataSourceRDD — Input RDD Of DataSourceV2ScanExec Physical Operator

Requesting Preferred Locations (For Partition) — getPreferredLocations Method

RDD Partitions — getPartitions Method

Computing Partition (in TaskContext) — compute Method

results matching ""

No results matching ""

Requesting Preferred Locations (For Partition) — `getPreferredLocations` Method

RDD Partitions — `getPartitions` Method

Computing Partition (in TaskContext) — `compute` Method