sourceSchema(): SourceInfo
DataSource — Pluggable Data Provider Framework
Tip
|
Read up on DataSource — Pluggable Data Provider Framework in The Internals of Spark SQL online book. |
Creating DataSource Instance
DataSource
takes the following to be created:
DataSource
initializes the internal properties.
Generating Metadata of Streaming Source (Data Source API V1) — sourceSchema
Internal Method
sourceSchema
creates a new instance of the data source class and branches off per the type, e.g. StreamSourceProvider, FileFormat and other types.
Note
|
sourceSchema is used exclusively when DataSource is requested for the SourceInfo.
|
StreamSourceProvider
For a StreamSourceProvider, sourceSchema
requests the StreamSourceProvider
for the name and schema (of the streaming source).
In the end, sourceSchema
returns the name and the schema as part of SourceInfo
(with partition columns unspecified).
Creating Streaming Source (Micro-Batch Stream Processing / Data Source API V1) — createSource
Method
createSource(
metadataPath: String): Source
createSource
creates a new instance of the data source class and branches off per the type, e.g. StreamSourceProvider, FileFormat and other types.
Note
|
createSource is used exclusively when MicroBatchExecution is requested to initialize the analyzed logical plan.
|
StreamSourceProvider
For a StreamSourceProvider, createSource
requests the StreamSourceProvider
to create a source.
FileFormat
For a FileFormat
, createSource
creates a new FileStreamSource.
createSource
throws an IllegalArgumentException
when path
option was not specified for a FileFormat
data source:
'path' is not specified
Creating Streaming Sink — createSink
Method
createSink(
outputMode: OutputMode): Sink
createSink
creates a streaming sink for StreamSinkProvider or FileFormat
data sources.
Tip
|
Read up on FileFormat Data Source in The Internals of Spark SQL book. |
Internally, createSink
creates a new instance of the providingClass and branches off per type:
-
For a StreamSinkProvider,
createSink
simply delegates the call and requests it to create a streaming sink -
For a
FileFormat
,createSink
creates a FileStreamSink whenpath
option is specified and the output mode is Append
createSink
throws a IllegalArgumentException
when path
option is not specified for a FileFormat
data source:
'path' is not specified
createSink
throws an AnalysisException
when the given OutputMode is different from Append for a FileFormat
data source:
Data source [className] does not support [outputMode] output mode
createSink
throws an UnsupportedOperationException
for unsupported data source formats:
Data source [className] does not support streamed writing
Note
|
createSink is used exclusively when DataStreamWriter is requested to start a streaming query.
|
Internal Properties
Name | Description |
---|---|
|
java.lang.Class for the className (that can be a fully-qualified class name or an alias of the data source) |
|
Metadata of a Source with the alias (short name), the schema, and optional partitioning columns
Used when:
|