DataSource — Pluggable Data Source

DataSource is…​FIXME

DataSource is created when…​FIXME

Table 1. DataSource’s Internal Properties (e.g. Registries, Counters and Flags)
Name Description


java.lang.Class that corresponds to the className (that can be a fully-qualified class name or an alias of the data source)


SourceInfo with the name, the schema, and optional partitioning columns of a source.

Used when:

Describing Name and Schema of Streaming Source — sourceSchema Internal Method

sourceSchema(): SourceInfo


sourceSchema is used exclusively when DataSource is requested for the SourceInfo.

Creating DataSource Instance

DataSource takes the following when created:

  • SparkSession

  • className, i.e. the fully-qualified class name or an alias of the data source

  • Paths (default: Nil, i.e. an empty collection)

  • Optional user-defined schema (default: None)

  • Names of the partition columns (default: empty)

  • Optional BucketSpec (default: None)

  • Configuration options (default: empty)

  • Optional CatalogTable (default: None)

DataSource initializes the internal registries and counters.

Creating Streaming Source — createSource Method

createSource(metadataPath: String): Source


createSource is used exclusively when MicroBatchExecution is requested to initialize the analyzed logical plan.

Creating Streaming Sink — createSink Method

createSink(outputMode: OutputMode): Sink

createSink creates a streaming sink for StreamSinkProvider or FileFormat data sources.

Internally, createSink creates a new instance of the providingClass and branches off per type:

createSink throws a IllegalArgumentException when path option is not specified for a FileFormat data source:

'path' is not specified

createSink throws an AnalysisException when the given OutputMode is different from Append for a FileFormat data source:

Data source [className] does not support [outputMode] output mode

createSink throws an UnsupportedOperationException for unsupported data source formats:

Data source [className] does not support streamed writing
createSink is used exclusively when DataStreamWriter is requested to create and start a streaming query.

