getPreferredLocations(split: Partition): Seq[String]
DataSourceRDD — Input RDD Of DataSourceV2ScanExec Physical Operator
DataSourceRDD
acts as a thin adapter between Spark SQL’s Data Source API V2 and Spark Core’s RDD API.
DataSourceRDD
is an RDD
that is created exclusively when DataSourceV2ScanExec
physical operator is requested for the input RDD (when WholeStageCodegenExec
physical operator is executed).
DataSourceRDD
uses DataSourceRDDPartition when requested for the partitions (that is a mere wrapper of the InputPartitions).
DataSourceRDD
takes the following to be created:
-
InputPartitions (
Seq[InputPartition[T]]
)
DataSourceRDD
as a Spark Core RDD
overrides the following methods:
Requesting Preferred Locations (For Partition) — getPreferredLocations
Method
Note
|
getPreferredLocations is part of Spark Core’s RDD Contract to…FIXME.
|
getPreferredLocations
simply requests the given split
DataSourceRDDPartition for the InputPartition that in turn is requested for the preferred locations.
RDD Partitions — getPartitions
Method
getPartitions: Array[Partition]
Note
|
getPartitions is part of Spark Core’s RDD Contract to…FIXME
|
getPartitions
simply creates a DataSourceRDDPartition for every InputPartition.
Computing Partition (in TaskContext) — compute
Method
compute(split: Partition, context: TaskContext): Iterator[T]
Note
|
compute is part of Spark Core’s RDD Contract to compute a partition (in a TaskContext ).
|
compute
requests the input DataSourceRDDPartition (the split
partition) for the InputPartition that in turn is requested to create an InputPartitionReader.
compute
registers a Spark Core TaskCompletionListener
that requests the InputPartitionReader
to close when a task completes.
compute
returns a Spark Core InterruptibleIterator
that requests the InputPartitionReader
to proceed to the next record (when requested to hasNext
) and return the current record (when next
).