package org.apache.spark.sql.sources
abstract class BaseRelation {
// only required properties (vals and methods) that have no implementation
// the others follow
def schema: StructType
def sqlContext: SQLContext
}
BaseRelation — Collection of Tuples with Schema
|
Note
|
"Data source", "relation" and "table" are often used as synonyms. |
| Method | Description |
|---|---|
|
StructType that describes the schema of tuples |
|
BaseRelation is "created" when DataSource is requested to resolve a relation.
BaseRelation is transformed into a DataFrame when SparkSession is requested to create a DataFrame.
BaseRelation uses needConversion flag to control type conversion of objects inside Rows to Catalyst types, e.g. java.lang.String to UTF8String.
|
Note
|
It is recommended that custom data sources (outside Spark SQL) should leave needConversion flag enabled, i.e. true.
|
BaseRelation can optionally give an estimated size (in bytes).
| BaseRelation | Description |
|---|---|
|
|
Should JVM Objects Inside Rows Be Converted to Catalyst Types? — needConversion Method
needConversion: Boolean
needConversion flag is enabled (true) by default.
|
Note
|
It is recommended to leave needConversion enabled for data sources outside Spark SQL.
|
|
Note
|
needConversion is used exclusively when DataSourceStrategy execution planning strategy is executed (and does the RDD conversion from RDD[Row] to RDD[InternalRow]).
|
Finding Unhandled Filter Predicates — unhandledFilters Method
unhandledFilters(filters: Array[Filter]): Array[Filter]
unhandledFilters returns Filter predicates that the data source does not support (handle) natively.
|
Note
|
unhandledFilters returns the input filters by default as it is considered safe to double evaluate filters regardless whether they could be supported or not.
|
|
Note
|
unhandledFilters is used exclusively when DataSourceStrategy execution planning strategy is requested to selectFilters.
|
Estimated Size Of Relation (In Bytes) — sizeInBytes Method
sizeInBytes: Long
sizeInBytes is the estimated size of a relation (used in query planning).
|
Note
|
sizeInBytes defaults to spark.sql.defaultSizeInBytes internal property (i.e. infinite).
|
|
Note
|
sizeInBytes is used exclusively when LogicalRelation is requested to computeStats (and they are not available in CatalogTable).
|