LocationStrategy — Preferred Hosts per Topic Partitions

LocationStrategy allows a DirectKafkaInputDStream to request Spark executors to execute Kafka consumers as close topic leaders of topic partitions as possible.

LocationStrategy is used when DirectKafkaInputDStream computes a KafkaRDD for a given batch interval and is a means of distributing processing Kafka records across Spark executors.

Table 1. Location Strategies in Spark Streaming
Location Strategy Description

PreferBrokers

Use when executors are on the same nodes as your Kafka brokers.

PreferConsistent

Use in most cases as it consistently distributes partitions across all executors.

PreferFixed

Use to place particular TopicPartitions on particular hosts if your load is uneven.

Accepts a collection of topic partition and host pairs. Any topic partition not specified uses a consistent location.

Note
A topic partition is described using Kafka’s TopicPartition.

You can create a LocationStrategy using LocationStrategies factory object.

import org.apache.spark.streaming.kafka010.LocationStrategies
val preferredHosts = LocationStrategies.PreferConsistent

LocationStrategies Factory Object

LocationStrategies holds the factory methods to access LocationStrategy objects.

PreferBrokers: LocationStrategy
PreferConsistent: LocationStrategy
PreferFixed(hostMap: collection.Map[TopicPartition, String]): LocationStrategy

results matching ""

    No results matching ""