DataSourceRDD — Input RDD Of DataSourceV2ScanExec Physical Operator

DataSourceRDD is an RDD that is created exclusively when DataSourceV2ScanExec physical operator is requested for the input RDD (when WholeStageCodegenExec physical operator is executed).

DataSourceRDD uses DataSourceRDDPartition partitions.

Requesting Preferred Locations Of DataReaderFactory (For Partition) — getPreferredLocations Method

getPreferredLocations(split: Partition): Seq[String]
Note
getPreferredLocations is part of Spark Core’s RDD Contract to…​FIXME.

getPreferredLocations simply requests the preferred locations of the DataReaderFactory of the input DataSourceRDDPartition partition.

getPartitions Method

getPartitions: Array[Partition]
Note
getPartitions is part of Spark Core’s RDD Contract to…​FIXME

getPartitions simply creates a DataSourceRDDPartition for every DataReaderFactory in the readerFactories.

Creating DataSourceRDD Instance

DataSourceRDD takes the following when created:

Computing Partition (in TaskContext) — compute Method

compute(split: Partition, context: TaskContext): Iterator[T]
Note
compute is part of Spark Core’s RDD Contract to compute a partition (in a TaskContext).

compute requests the DataReaderFactory (of the DataSourceRDDPartition partition) to createDataReader.

compute registers a Spark Core TaskCompletionListener that requests the DataReader to close at a task completion.

compute returns a Spark Core InterruptibleIterator that…​FIXME

results matching ""

    No results matching ""