compute(
thePart: Partition,
context: TaskContext): Iterator[ConsumerRecord[Array[Byte], Array[Byte]]]
KafkaSourceRDD
KafkaSourceRDD
is an RDD
of Kafka’s ConsumerRecords (with keys and values being collections of bytes, i.e. Array[Byte]
).
KafkaSourceRDD
uses KafkaSourceRDDPartition for the partitions.
KafkaSourceRDD
has a specialized API for the following RDD operators:
KafkaSourceRDD
is created when:
-
KafkaRelation
is requested to build a distributed data scan with column pruning (as a TableScan) -
(Spark Structured Streaming)
KafkaSource
is requested togetBatch
Creating KafkaSourceRDD Instance
KafkaSourceRDD
takes the following when created:
-
Collection of key-value settings for executors reading records from Kafka topics
-
Collection of KafkaSourceRDDOffsetRanges
-
Timeout (in milliseconds) to poll data from Kafka
Used exclusively when
KafkaSourceRDD
is requested to compute a RDD partition (and requests aKafkaDataConsumer
for aConsumerRecord
)
KafkaSourceRDD
initializes the internal registries and counters.
Computing Partition (in TaskContext) — compute
Method
Note
|
compute is part of Spark Core’s RDD Contract to compute a partition (in a TaskContext ).
|
compute
…FIXME
count
Operator
count(): Long
Note
|
count is part of Spark Core’s RDD Contract to…FIXME.
|
count
…FIXME
countApprox
Operator
countApprox(timeout: Long, confidence: Double): PartialResult[BoundedDouble]
Note
|
countApprox is part of Spark Core’s RDD Contract to…FIXME.
|
countApprox
…FIXME
isEmpty
Operator
isEmpty(): Boolean
Note
|
isEmpty is part of Spark Core’s RDD Contract to…FIXME.
|
isEmpty
…FIXME
persist
Operator
persist(newLevel: StorageLevel): this.type
Note
|
persist is part of Spark Core’s RDD Contract to…FIXME.
|
persist
…FIXME
getPartitions
Method
getPartitions: Array[Partition]
Note
|
getPartitions is part of Spark Core’s RDD Contract to…FIXME
|
getPreferredLocations
Method
getPreferredLocations(split: Partition): Seq[String]
Note
|
getPreferredLocations is part of the RDD Contract to…FIXME.
|
getPreferredLocations
…FIXME
resolveRange
Internal Method
resolveRange(
consumer: KafkaDataConsumer,
range: KafkaSourceRDDOffsetRange): KafkaSourceRDDOffsetRange
resolveRange
…FIXME
Note
|
resolveRange is used exclusively when KafkaSourceRDD is requested to compute a partition.
|