val kafka = spark.read.format("kafka").load
// Alternatively
val kafka = spark.read.format("org.apache.spark.sql.kafka010.KafkaSourceProvider").load
Kafka Data Source
|
Note
|
Apache Kafka is a storage of records in a format-independent and fault-tolerant durable way. Read up on Apache Kafka in the official documentation or in my other gitbook Mastering Apache Kafka. |
Kafka Data Source supports options to get better performance of structured queries that use it.
Reading Data from Kafka Topics
As a Spark developer, you use DataFrameReader.format method to specify Apache Kafka as the external data source to load data from.
You use kafka (or org.apache.spark.sql.kafka010.KafkaSourceProvider) as the input data source format.
These one-liners create a DataFrame that represents the distributed process of loading data from one or many Kafka topics (with additional properties).