KafkaOffsetReader

KafkaOffsetReader is used to query a Kafka cluster for partition offsets.

KafkaOffsetReader is created when:

KafkaRelation is requested to build a distributed data scan with column pruning (as a TableScan) (to get the initial partition offsets)
(Spark Structured Streaming) KafkaSourceProvider is requested to createSource and createContinuousReader

When requested for the human-readable text representation (aka toString), KafkaOffsetReader simply requests the ConsumerStrategy for one.

Table 1. KafkaOffsetReader’s Options
Name	Default Value	Description
`fetchOffset.numRetries`	`3`
`fetchOffset.retryIntervalMs`	`1000`	How long to wait before retries.

Table 2. KafkaOffsetReader’s Internal Registries and Counters
Name	Description
`consumer`	Kafka’s Consumer (with keys and values of `Array[Byte]` type) Initialized when `KafkaOffsetReader` is created. Used when `KafkaOffsetReader`: fetchTopicPartitions fetchEarliestOffsets fetchLatestOffsets resetConsumer is closed
`execContext`
`groupId`
`kafkaReaderThread`
`maxOffsetFetchAttempts`
`nextId`
`offsetFetchAttemptIntervalMs`

Tip	Enable `INFO` or `DEBUG` logging levels for `org.apache.spark.sql.kafka010.KafkaOffsetReader` to see what happens inside. Add the following line to `conf/log4j.properties`: `log4j.logger.org.apache.spark.sql.kafka010.KafkaOffsetReader=DEBUG` Refer to Logging.

Creating Kafka Consumer — `createConsumer` Internal Method

createConsumer(): Consumer[Array[Byte], Array[Byte]]

createConsumer requests the ConsumerStrategy to create a Kafka Consumer with driverKafkaParams and new generated group.id Kafka property.

Note	`createConsumer` is used when `KafkaOffsetReader` is created (and initializes consumer) and resetConsumer

Creating KafkaOffsetReader Instance

KafkaOffsetReader takes the following when created:

ConsumerStrategy
Kafka parameters (as Map[String, Object])
Reader options (as Map[String, String])
Prefix for the group id

KafkaOffsetReader initializes the internal registries and counters.

`close` Method

close(): Unit

close…FIXME

Note	`close` is used when…FIXME

`fetchEarliestOffsets` Method

fetchEarliestOffsets(): Map[TopicPartition, Long]

fetchEarliestOffsets…FIXME

Note	`fetchEarliestOffsets` is used when…FIXME

`fetchEarliestOffsets` Method

fetchEarliestOffsets(newPartitions: Seq[TopicPartition]): Map[TopicPartition, Long]

fetchEarliestOffsets…FIXME

Note	`fetchEarliestOffsets` is used when…FIXME

`fetchLatestOffsets` Method

fetchLatestOffsets(): Map[TopicPartition, Long]

fetchLatestOffsets…FIXME

Note	`fetchLatestOffsets` is used when…FIXME

Fetching (and Pausing) Assigned Kafka TopicPartitions — `fetchTopicPartitions` Method

fetchTopicPartitions(): Set[TopicPartition]

fetchTopicPartitions uses an UninterruptibleThread thread to do the following:

Requests the Kafka Consumer to poll (fetch data) for the topics and partitions (with 0 timeout)
Requests the Kafka Consumer to get the set of partitions currently assigned
Requests the Kafka Consumer to suspend fetching from the partitions assigned

In the end, fetchTopicPartitions returns the TopicPartitions assigned (and paused).

Note	`fetchTopicPartitions` is used exclusively when `KafkaRelation` is requested to build a distributed data scan with column pruning (as a TableScan) through getPartitionOffsets.

`nextGroupId` Internal Method

nextGroupId(): String

nextGroupId…FIXME

Note	`nextGroupId` is used when…FIXME

`resetConsumer` Internal Method

resetConsumer(): Unit

resetConsumer…FIXME

Note	`resetConsumer` is used when…FIXME

`runUninterruptibly` Internal Method

runUninterruptibly[T](body: => T): T

runUninterruptibly…FIXME

Note	`runUninterruptibly` is used when…FIXME

`withRetriesWithoutInterrupt` Internal Method

withRetriesWithoutInterrupt(body: => Map[TopicPartition, Long]): Map[TopicPartition, Long]

withRetriesWithoutInterrupt…FIXME

Note	`withRetriesWithoutInterrupt` is used when…FIXME

KafkaOffsetReader