DataSourceReader Contract

DataSourceReader is the abstraction of data source readers in Data Source API V2 that can planInputPartitions and know the schema for reading.

Table 1. DataSourceReader Contract
Method Description

planInputPartitions

List<InputPartition<InternalRow>> planInputPartitions()

Used exclusively when DataSourceV2ScanExec leaf physical operator is requested for the input partitions (and simply delegates to the underlying DataSourceReader) to create the input RDD[InternalRow] (inputRDD)

readSchema

StructType readSchema()

Schema to use for reading (loading) data from a data source

Used when:

  • DataSourceV2Relation factory object is requested to create a DataSourceV2Relation (when DataFrameReader is requested to "load" data (as a DataFrame) from a data source with ReadSupport)

  • DataSourceV2Strategy execution planning strategy is requested to apply column pruning (pruneColumns)

  • Spark Structured Streaming’s MicroBatchExecution stream execution is requested to run a single streaming batch

  • Spark Structured Streaming’s ContinuousExecution stream execution is requested to run a streaming query in continuous mode

  • Spark Structured Streaming’s DataStreamReader is requested to "load" data (as a DataFrame)

Note

DataSourceReader is an Evolving contract that is evolving towards becoming a stable API, but is not a stable API yet and can change from one feature release to another release.

In other words, using the contract is as "treading on thin ice".

Table 2. DataSourceReaders (Direct Implementations)
DataSourceReader Description

ContinuousReader

DataSourceReaders for Continuous Stream Processing in Spark Structured Streaming

Consult The Internals of Spark Structured Streaming

MicroBatchReader

DataSourceReaders for Micro-Batch Stream Processing in Spark Structured Streaming

Consult The Internals of Spark Structured Streaming

SupportsPushDownFilters

SupportsPushDownRequiredColumns

SupportsReportPartitioning

SupportsReportStatistics

SupportsScanColumnarBatch

results matching ""

    No results matching ""