DataSourceV2Relation Leaf Logical Operator

DataSourceV2Relation is a leaf logical operator that represents a data scan (data reading) or data writing in the Data Source API V2.

DataSourceV2Relation is created (indirectly via create helper method) exclusively when DataFrameReader is requested to "load" data (as a DataFrame) (from a data source with ReadSupport).

DataSourceV2Relation takes the following to be created:

  • DataSourceV2

  • Output attributes (Seq[AttributeReference])

  • Options (Map[String, String])

  • Optional TableIdentifier (default: undefined, i.e. None)

  • User-defined schema (default: undefined, i.e. None)

When used to represent a data scan (data reading), DataSourceV2Relation is planned (translated) to a ProjectExec with a DataSourceV2ScanExec physical operator (possibly under the FilterExec operator) when DataSourceV2Strategy execution planning strategy is requested to plan a logical plan.

When used to represent a data write (with AppendData logical operator), DataSourceV2Relation is planned (translated) to a WriteToDataSourceV2Exec physical operator (with the DataSourceWriter) when DataSourceV2Strategy execution planning strategy is requested to plan a logical plan.

DataSourceV2Relation object defines a SourceHelpers implicit class that extends DataSourceV2 instances with the additional extension methods.

Creating DataSourceV2Relation Instance — create Factory Method

create(
  source: DataSourceV2,
  options: Map[String, String],
  tableIdent: Option[TableIdentifier] = None,
  userSpecifiedSchema: Option[StructType] = None): DataSourceV2Relation

create requests the given DataSourceV2 to create a DataSourceReader (with the given options and user-specified schema).

create finds the table in the given options unless the optional tableIdent is defined.

In the end, create creates a DataSourceV2Relation.

Note
create is used exclusively when DataFrameReader is requested to "load" data (as a DataFrame) (from a data source with ReadSupport).

Computing Statistics — computeStats Method

computeStats(): Statistics
Note
computeStats is part of the LeafNode Contract to compute a Statistics.

computeStats…​FIXME

Creating DataSourceReader — newReader Method

newReader(): DataSourceReader

newReader simply requests (delegates to) the DataSourceV2 to create a DataSourceReader.

Note
DataSourceV2Relation object defines the SourceHelpers implicit class to "extend" the marker DataSourceV2 type with the method to create a DataSourceReader.
Note

newReader is used when:

Creating DataSourceWriter — newWriter Method

newWriter(): DataSourceWriter

newWriter simply requests (delegates to) the DataSourceV2 to create a DataSourceWriter.

Note
DataSourceV2Relation object defines the SourceHelpers implicit class to "extend" the marker DataSourceV2 type with the method to create a DataSourceWriter.
Note
newWriter is used exclusively when DataSourceV2Strategy execution planning strategy is requested to plan an AppendData logical operator.

SourceHelpers Implicit Class

DataSourceV2Relation object defines a SourceHelpers implicit class that extends DataSourceV2 instances with the additional extension methods.

Table 1. SourceHelpers' Extension Methods
Method Description

asReadSupport

asReadSupport: ReadSupport

Used exclusively for createReader implicit method

asWriteSupport

asWriteSupport: WriteSupport

Used when…​FIXME

name

name: String

Used when…​FIXME

createReader

createReader(
  options: Map[String, String],
  userSpecifiedSchema: Option[StructType]): DataSourceReader

Used when:

createWriter

createWriter(
  options: Map[String, String],
  schema: StructType): DataSourceWriter

Creates a DataSourceWriter

Used when…​FIXME

results matching ""

    No results matching ""