JDBCRDD is a RDD of internal binary rows that represents a structured query over a table in a database accessed via JDBC.

JDBCRDD represents a "SELECT requiredColumns FROM table" query.

JDBCRDD is created exclusively when JDBCRDD is requested to scanTable (when JDBCRelation is requested to build a scan).

Table 1. JDBCRDD’s Internal Properties (e.g. Registries, Counters and Flags)
Name Description


Column names

Used when…​FIXME


Filters as a SQL WHERE clause

Used when…​FIXME

Computing Partition (in TaskContext) — compute Method

compute(thePart: Partition, context: TaskContext): Iterator[InternalRow]
compute is part of Spark Core’s RDD Contract to compute a partition (in a TaskContext).


resolveTable Method

resolveTable(options: JDBCOptions): StructType


resolveTable is used exclusively when JDBCRelation is requested for the schema.

Creating RDD for Distributed Data Scan — scanTable Object Method

  sc: SparkContext,
  schema: StructType,
  requiredColumns: Array[String],
  filters: Array[Filter],
  parts: Array[Partition],
  options: JDBCOptions): RDD[InternalRow]

scanTable takes the url option.

scanTable finds the corresponding JDBC dialect (per the url option) and requests it to quote the column identifiers in the input requiredColumns.

scanTable uses the JdbcUtils object to createConnectionFactory and prune columns from the input schema to include the input requiredColumns only.

In the end, scanTable creates a new JDBCRDD.

scanTable is used exclusively when JDBCRelation is requested to build a distributed data scan with column pruning and filter pushdown.

Creating JDBCRDD Instance

JDBCRDD takes the following when created:

  • SparkContext

  • Function to create a Connection (() ⇒ Connection)

  • Schema (StructType)

  • Array of column names

  • Array of Filter predicates

  • Array of Spark Core’s Partitions

  • Connection URL

  • JDBCOptions

JDBCRDD initializes the internal registries and counters.

getPartitions Method

getPartitions: Array[Partition]
getPartitions is part of Spark Core’s RDD Contract to…​FIXME

getPartitions simply returns the partitions (this JDBCRDD was created with).

pruneSchema Internal Method

pruneSchema(schema: StructType, columns: Array[String]): StructType


pruneSchema is used when…​FIXME

Converting Filter Predicate to SQL Expression — compileFilter Object Method

compileFilter(f: Filter, dialect: JdbcDialect): Option[String]



compileFilter is used when:

results matching ""

    No results matching ""