compute(
split: Partition,
context: TaskContext): Iterator[InternalRow]
ContinuousDataSourceRDD — Input RDD of DataSourceV2ScanExec Physical Operator with ContinuousReader
ContinuousDataSourceRDD is a specialized RDD (RDD[InternalRow]) that is used exclusively for the only input RDD (with the input rows) of DataSourceV2ScanExec leaf physical operator with a ContinuousReader.
ContinuousDataSourceRDD is created exclusively when DataSourceV2ScanExec leaf physical operator is requested for the input RDDs (which there is only one actually).
ContinuousDataSourceRDD uses spark.sql.streaming.continuous.executorQueueSize configuration property for the size of the data queue.
ContinuousDataSourceRDD uses spark.sql.streaming.continuous.executorPollIntervalMs configuration property for the epochPollIntervalMs.
ContinuousDataSourceRDD takes the following to be created:
ContinuousDataSourceRDD uses InputPartition (of a ContinuousDataSourceRDDPartition) for preferred host locations (where the input partition reader can run faster).
Computing Partition — compute Method
|
Note
|
compute is part of the RDD Contract to compute a given partition.
|
compute…FIXME
getPartitions Method
getPartitions: Array[Partition]
|
Note
|
getPartitions is part of the RDD Contract to specify the partitions to compute.
|
getPartitions…FIXME