BaseRelation — Collection of Tuples with Schema

BaseRelation is the contract in Spark SQL to model a collection of tuples (from a data source) with a schema.

Note
A "data source" and "relation" and "table" are often used as synonyms.

BaseRelation can optionally provide information about its estimated size in bytes (as sizeInBytes) that defaults to spark.sql.defaultSizeInBytes internal property (i.e. infinite).

BaseRelation whether it needs a conversion.

BaseRelation computes the list of Filter that this data source may not be able to handle.

Table 1. BaseRelations
BaseRelation Description

HadoopFsRelation

JDBCRelation

KafkaRelation

Structured Streaming’s BaseRelation for datasets with records from Apache Kafka

Note
BaseRelation is "created" using DataSource's resolveRelation.
Note
BaseRelation is transformed into a DataFrame using SparkSession.baseRelationToDataFrame.

BaseRelation Contract

package org.apache.spark.sql.sources

abstract class BaseRelation {
  // only required methods that have no implementation
  def schema: StructType
  def sqlContext: SQLContext
}
Table 2. (Subset of) BaseRelation Contract (in alphabetical order)
Method Description

schema

StructType

sqlContext

SQLContext

results matching ""

    No results matching ""