Data Source Filter Predicate (For Filter Pushdown)

Filter is the contract for filter predicates that can be pushed down to a relation (aka data source).

Filter is used when:

(Data Source API V1) BaseRelation is requested for unhandled filter predicates (and hence BaseRelation implementations, i.e. JDBCRelation)
(Data Source API V1) PrunedFilteredScan is requested for build a scan (and hence PrunedFilteredScan implementations, i.e. JDBCRelation)
FileFormat is requested to buildReader (and hence FileFormat implementations, i.e. OrcFileFormat, CSVFileFormat, JsonFileFormat, TextFileFormat and Spark MLlib’s LibSVMFileFormat)
FileFormat is requested to build a Data Reader with partition column values appended (and hence FileFormat implementations, i.e. OrcFileFormat, ParquetFileFormat)
RowDataSourceScanExec is created (for a simple text representation (in a query plan tree))
DataSourceStrategy execution planning strategy is requested to pruneFilterProject (when executed for LogicalRelation logical operators with a PrunedFilteredScan or a PrunedScan)
DataSourceStrategy execution planning strategy is requested to selectFilters
JDBCRDD is created and requested to scanTable
(Data Source API V2) SupportsPushDownFilters is requested to pushFilters and for pushedFilters

package org.apache.spark.sql.sources

abstract class Filter {
  // only required methods that have no implementation
  // the others follow
  def references: Array[String]
}

Table 1. Filter Contract
Method	Description
`references`	Column references, i.e. list of column names that are referenced by a filter Used when: `Filter` is requested to find the column references in a value And, Or and Not filters are requested for the column references

Table 2. Filters
Filter	Description
`And`
`EqualNullSafe`
`EqualTo`
`GreaterThan`
`GreaterThanOrEqual`
`In`
`IsNotNull`
`IsNull`
`LessThan`
`LessThanOrEqual`
`Not`
`Or`
`StringContains`
`StringEndsWith`
`StringStartsWith`

Finding Column References in Any Value — `findReferences` Method

findReferences(value: Any): Array[String]

findReferences takes the references from the value filter is it is one or returns an empty array.

Note	`findReferences` is used when EqualTo, EqualNullSafe, GreaterThan, GreaterThanOrEqual, LessThan, LessThanOrEqual and In filters are requested for their column references.

Data Source Filter Predicate (For Filter Pushdown)

Data Source Filter Predicate (For Filter Pushdown)

Finding Column References in Any Value — `findReferences` Method

results matching ""

No results matching ""

Data Source Filter Predicate (For Filter Pushdown)

Finding Column References in Any Value — findReferences Method

results matching ""

No results matching ""

Finding Column References in Any Value — `findReferences` Method