DataSourceV2 — Data Sources in Data Source API V2

DataSourceV2 is the fundamental abstraction of the data sources in the Data Source API V2.

DataSourceV2 defines no methods or values and simply acts as a marker interface.

package org.apache.spark.sql.sources.v2;

public interface DataSourceV2 {}
Note

Implementations should at least use ReadSupport or WriteSupport interfaces for readable or writable data sources, respectively.

Otherwise, an AnalysisException is thrown:

org.apache.spark.sql.AnalysisException: dawid is not a valid Spark SQL Data Source.;
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:386)
  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:208)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
  ... 49 elided
Note

DataSourceV2 is an Evolving contract that is evolving towards becoming a stable API, but is not a stable API yet and can change from one feature release to another release.

In other words, using the contract is as treading on thin ice.

Table 1. DataSourceV2s
DataSourceV2 Description

ConsoleSinkProvider

Used in Spark Structured Streaming

ContinuousReadSupport

Used in Spark Structured Streaming

MemorySinkV2

Used in Spark Structured Streaming

MicroBatchReadSupport

Used in Spark Structured Streaming

RateSourceProvider

Used in Spark Structured Streaming

RateSourceProviderV2

Used in Spark Structured Streaming

ReadSupport

ReadSupportWithSchema

SessionConfigSupport

StreamWriteSupport

WriteSupport

results matching ""

    No results matching ""