commit(end: Offset): Unit
Source Contract — Streaming Sources for Micro-Batch Stream Processing (Data Source API V1)
Source
is the extension of the BaseStreamingSource contract for streaming sources that work with "continuous" stream of data identified by offsets.
Source
is part of Data Source API V1 and used in Micro-Batch Stream Processing only.
For fault tolerance, Source
must be able to replay an arbitrary sequence of past data in a stream using a range of offsets. This is the assumption so Structured Streaming can achieve end-to-end exactly-once guarantees.
Method | Description | ||
---|---|---|---|
|
Commits data up to the end offset, i.e. informs the source that Spark has completed processing all data for offsets less than or equal to the end offset and will only request offsets greater than the end offset in the future. Used exclusively when MicroBatchExecution stream execution engine (Micro-Batch Stream Processing) is requested to write offsets to a commit log (walCommit phase) while running an activated streaming query. |
||
|
Generating a streaming Start offset can be undefined ( Used when MicroBatchExecution stream execution engine (Micro-Batch Stream Processing) is requested to run an activated streaming query, namely: |
||
|
Latest (maximum) offset of the source (or Used exclusively when MicroBatchExecution stream execution engine (Micro-Batch Stream Processing) is requested for latest offsets of all sources (getOffset phase) while running activated streaming query. |
||
|
Schema of the source
|
Source | Description |
---|---|
Part of kafka data source |