getPreferredLocations(
split: Partition): Seq[String]
KafkaSourceRDD
KafkaSourceRDD
is an RDD
of Kafka’s ConsumerRecords (RDD[ConsumerRecord[Array[Byte], Array[Byte]]]
) and no parent RDDs.
KafkaSourceRDD
is created when:
-
KafkaRelation
is requested to build a distributed data scan with column pruning -
KafkaSource
is requested to generate a streaming DataFrame with records from Kafka for a streaming micro-batch
Creating KafkaSourceRDD Instance
KafkaSourceRDD
takes the following when created:
-
Collection of key-value settings for executors reading records from Kafka topics
-
Timeout (in milliseconds) to poll data from Kafka
Used when
KafkaSourceRDD
is requested for records (for given offsets) and in turn requestsCachedKafkaConsumer
to poll for Kafka’sConsumerRecords
.
Placement Preferences of Partition (Preferred Locations) — getPreferredLocations
Method
Note
|
getPreferredLocations is part of the RDD contract to specify placement preferences.
|
getPreferredLocations
converts the given Partition
to a KafkaSourceRDDPartition
and…FIXME
Computing Partition — compute
Method
compute(
thePart: Partition,
context: TaskContext
): Iterator[ConsumerRecord[Array[Byte], Array[Byte]]]
Note
|
compute is part of the RDD contract to compute a given partition.
|
compute
uses KafkaDataConsumer
utility to acquire a cached KafkaDataConsumer (for a partition).
compute
resolves the range (based on the offsetRange
of the given partition that is assumed a KafkaSourceRDDPartition
).
compute
returns a NextIterator
so that getNext
uses the KafkaDataConsumer
to get a record.
When the beginning and ending offsets (of the offset range) are equal, compute
prints out the following INFO message to the logs, requests the KafkaDataConsumer
to release and returns an empty iterator.
Beginning offset [fromOffset] is the same as ending offset skipping [topic] [partition]
compute
throws an AssertionError
when the beginning offset (fromOffset
) is after the ending offset (untilOffset
):
Beginning offset [fromOffset] is after the ending offset [untilOffset] for topic [topic] partition [partition]. You either provided an invalid fromOffset, or the Kafka topic has been damaged
getPartitions
Method
getPartitions: Array[Partition]
Note
|
getPartitions is part of the RDD contract to…FIXME.
|
getPartitions
…FIXME