BaseRelation works in a SQLContext with a data of a given schema (as StructType). BaseRelation knows its size (as sizeInBytes), whether it needs a conversion, and computes the list of Filter that this data source may not be able to handle.

Table 1. BaseRelation Methods
Name Behaviour


Returns the current SQLContext.


Returns the current StructType.


Computes an estimated size of this relation in bytes.


Whether the relation needs a conversion of the objects in Row to internal representation.


Computes the list of Filters that this data source may not be able to handle.

A "data source" and "relation" appear as synonyms.

BaseRelation is an abstract class in org.apache.spark.sql.sources package.


case class HadoopFsRelation(
  location: FileIndex,
  partitionSchema: StructType,
  dataSchema: StructType,
  bucketSpec: Option[BucketSpec],
  fileFormat: FileFormat,
  options: Map[String, String])(val sparkSession: SparkSession)
extends BaseRelation with FileRelation

HadoopFsRelation is a BaseRelation in a SparkSession (through which it gets to the current SQLContext).

HadoopFsRelation requires a schema (as StructType) that it expands with the input partitionSchema schema.

sizeInBytes and inputFiles (from the base BaseRelation) use the input FileIndex to compute the size and input files, respectively.

results matching ""

    No results matching ""