KafkaSourceRDD

KafkaSourceRDD is an RDD of Kafka’s ConsumerRecords (RDD[ConsumerRecord[Array[Byte], Array[Byte]]]) and no parent RDDs.

KafkaSourceRDD is created when:

Creating KafkaSourceRDD Instance

KafkaSourceRDD takes the following when created:

Placement Preferences of Partition (Preferred Locations) — getPreferredLocations Method

getPreferredLocations(
  split: Partition): Seq[String]
Note
getPreferredLocations is part of the RDD contract to specify placement preferences.

getPreferredLocations converts the given Partition to a KafkaSourceRDDPartition and…​FIXME

Computing Partition — compute Method

compute(
  thePart: Partition,
  context: TaskContext
): Iterator[ConsumerRecord[Array[Byte], Array[Byte]]]
Note
compute is part of the RDD contract to compute a given partition.

compute uses KafkaDataConsumer utility to acquire a cached KafkaDataConsumer (for a partition).

compute resolves the range (based on the offsetRange of the given partition that is assumed a KafkaSourceRDDPartition).

compute returns a NextIterator so that getNext uses the KafkaDataConsumer to get a record.

When the beginning and ending offsets (of the offset range) are equal, compute prints out the following INFO message to the logs, requests the KafkaDataConsumer to release and returns an empty iterator.

Beginning offset [fromOffset] is the same as ending offset skipping [topic] [partition]

compute throws an AssertionError when the beginning offset (fromOffset) is after the ending offset (untilOffset):

Beginning offset [fromOffset] is after the ending offset [untilOffset] for topic [topic] partition [partition]. You either provided an invalid fromOffset, or the Kafka topic has been damaged

getPartitions Method

getPartitions: Array[Partition]
Note
getPartitions is part of the RDD contract to…​FIXME.

getPartitions…​FIXME

Persisting RDD — persist Method

persist: Array[Partition]
Note
persist is part of the RDD contract to persist an RDD.

persist…​FIXME

resolveRange Internal Method

resolveRange(
  consumer: KafkaDataConsumer,
  range: KafkaSourceRDDOffsetRange
): KafkaSourceRDDOffsetRange

resolveRange…​FIXME

Note
resolveRange is used when…​FIXME

results matching ""

    No results matching ""