KafkaSourceRDD

KafkaSourceRDD is an RDD of Kafka’s ConsumerRecords (with keys and values being collections of bytes, i.e. Array[Byte]).

KafkaSourceRDD uses KafkaSourceRDDPartition for the partitions.

KafkaSourceRDD has a specialized API for the following RDD operators:

KafkaSourceRDD is created when:

Creating KafkaSourceRDD Instance

KafkaSourceRDD takes the following when created:

  • SparkContext

  • Collection of key-value settings for executors reading records from Kafka topics

  • Collection of KafkaSourceRDDOffsetRanges

  • Timeout (in milliseconds) to poll data from Kafka

    Used exclusively when KafkaSourceRDD is requested to compute a RDD partition (and requests a KafkaDataConsumer for a ConsumerRecord)

  • failOnDataLoss flag to control…​FIXME

  • reuseKafkaConsumer flag to control…​FIXME

KafkaSourceRDD initializes the internal registries and counters.

Computing Partition (in TaskContext) — compute Method

compute(
  thePart: Partition,
  context: TaskContext): Iterator[ConsumerRecord[Array[Byte], Array[Byte]]]
Note
compute is part of Spark Core’s RDD Contract to compute a partition (in a TaskContext).

compute…​FIXME

count Operator

count(): Long
Note
count is part of Spark Core’s RDD Contract to…​FIXME.

count…​FIXME

countApprox Operator

countApprox(timeout: Long, confidence: Double): PartialResult[BoundedDouble]
Note
countApprox is part of Spark Core’s RDD Contract to…​FIXME.

countApprox…​FIXME

isEmpty Operator

isEmpty(): Boolean
Note
isEmpty is part of Spark Core’s RDD Contract to…​FIXME.

isEmpty…​FIXME

persist Operator

persist(newLevel: StorageLevel): this.type
Note
persist is part of Spark Core’s RDD Contract to…​FIXME.

persist…​FIXME

getPartitions Method

getPartitions: Array[Partition]
Note
getPartitions is part of Spark Core’s RDD Contract to…​FIXME

getPreferredLocations Method

getPreferredLocations(split: Partition): Seq[String]
Note
getPreferredLocations is part of the RDD Contract to…​FIXME.

getPreferredLocations…​FIXME

resolveRange Internal Method

resolveRange(
  consumer: KafkaDataConsumer,
  range: KafkaSourceRDDOffsetRange): KafkaSourceRDDOffsetRange

resolveRange…​FIXME

Note
resolveRange is used exclusively when KafkaSourceRDD is requested to compute a partition.

results matching ""

    No results matching ""