DataSourceRDD — Input RDD Of DataSourceV2ScanExec Physical Operator

DataSourceRDD is an RDD that is created exclusively when DataSourceV2ScanExec physical operator is requested for the input RDD (when WholeStageCodegenExec physical operator is executed).

DataSourceRDD uses DataSourceRDDPartition partitions.

Requesting Preferred Locations Of DataReaderFactory (For Partition) — getPreferredLocations Method

getPreferredLocations(split: Partition): Seq[String]
getPreferredLocations is part of Spark Core’s RDD Contract to…​FIXME.

getPreferredLocations simply requests the preferred locations of the DataReaderFactory of the input DataSourceRDDPartition partition.

getPartitions Method

getPartitions: Array[Partition]
getPartitions is part of Spark Core’s RDD Contract to…​FIXME

getPartitions simply creates a DataSourceRDDPartition for every DataReaderFactory in the readerFactories.

Creating DataSourceRDD Instance

DataSourceRDD takes the following when created:

Computing Partition (in TaskContext) — compute Method

compute(split: Partition, context: TaskContext): Iterator[T]
compute is part of Spark Core’s RDD Contract to compute a partition (in a TaskContext).

compute requests the DataReaderFactory (of the DataSourceRDDPartition partition) to createDataReader.

compute registers a Spark Core TaskCompletionListener that requests the DataReader to close at a task completion.

compute returns a Spark Core InterruptibleIterator that…​FIXME

results matching ""

    No results matching ""