StreamSourceProvider Contract — Streaming Source Providers for Micro-Batch Stream Processing (Data Source API V1)

StreamSourceProvider is the contract of data source providers that can create a streaming source for a format (e.g. text file) or system (e.g. Apache Kafka).

StreamSourceProvider is part of Data Source API V1 and used in Micro-Batch Stream Processing only.

Table 1. StreamSourceProvider Contract
Method Description

createSource

createSource(
  sqlContext: SQLContext,
  metadataPath: String,
  schema: Option[StructType],
  providerName: String,
  parameters: Map[String, String]): Source

Creates a streaming source

Note
metadataPath is the value of the optional user-specified checkpointLocation option or resolved by StreamingQueryManager.

Used exclusively when DataSource is requested to create a streaming source (when MicroBatchExecution is requested to initialize the analyzed logical plan)

sourceSchema

sourceSchema(
  sqlContext: SQLContext,
  schema: Option[StructType],
  providerName: String,
  parameters: Map[String, String]): (String, StructType)

The name and schema of the streaming source

Used exclusively when DataSource is requested for metadata of a streaming source (when MicroBatchExecution is requested to initialize the analyzed logical plan)

Note
KafkaSourceProvider is the only known StreamSourceProvider in Spark Structured Streaming.

results matching ""

    No results matching ""