JDBCRDD

JDBCRDD is a RDD of internal binary rows that represents a structured query over a table in a database accessed via JDBC.

Note	`JDBCRDD` represents a "SELECT requiredColumns FROM table" query.

JDBCRDD is created exclusively when JDBCRDD is requested to scanTable (when JDBCRelation is requested to build a scan).

Table 1. JDBCRDD’s Internal Properties (e.g. Registries, Counters and Flags)
Name	Description
`columnList`	Column names Used when…FIXME
`filterWhereClause`	Filters as a SQL `WHERE` clause Used when…FIXME

Computing Partition (in TaskContext) — `compute` Method

compute(thePart: Partition, context: TaskContext): Iterator[InternalRow]

Note	`compute` is part of Spark Core’s `RDD` Contract to compute a partition (in a `TaskContext`).

compute…FIXME

`resolveTable` Method

resolveTable(options: JDBCOptions): StructType

resolveTable…FIXME

Note	`resolveTable` is used exclusively when `JDBCRelation` is requested for the schema.

Creating RDD for Distributed Data Scan — `scanTable` Object Method

scanTable(
  sc: SparkContext,
  schema: StructType,
  requiredColumns: Array[String],
  filters: Array[Filter],
  parts: Array[Partition],
  options: JDBCOptions): RDD[InternalRow]

scanTable takes the url option.

scanTable finds the corresponding JDBC dialect (per the url option) and requests it to quote the column identifiers in the input requiredColumns.

scanTable uses the JdbcUtils object to createConnectionFactory and prune columns from the input schema to include the input requiredColumns only.

In the end, scanTable creates a new JDBCRDD.

Note	`scanTable` is used exclusively when `JDBCRelation` is requested to build a distributed data scan with column pruning and filter pushdown.

Creating JDBCRDD Instance

JDBCRDD takes the following when created:

SparkContext
Function to create a Connection (() ⇒ Connection)
Schema (StructType)
Array of column names
Array of Filter predicates
Array of Spark Core’s Partitions
Connection URL
JDBCOptions

JDBCRDD initializes the internal registries and counters.

`getPartitions` Method

getPartitions: Array[Partition]

Note	`getPartitions` is part of Spark Core’s `RDD` Contract to…FIXME

getPartitions simply returns the partitions (this JDBCRDD was created with).

`pruneSchema` Internal Method

pruneSchema(schema: StructType, columns: Array[String]): StructType

pruneSchema…FIXME

Note	`pruneSchema` is used when…FIXME

Converting Filter Predicate to SQL Expression — `compileFilter` Object Method

compileFilter(f: Filter, dialect: JdbcDialect): Option[String]

compileFilter…FIXME

Note	`compileFilter` is used when: `JDBCRelation` is requested to find unhandled Filter predicates `JDBCRDD` is created

JDBCRDD

JDBCRDD

Computing Partition (in TaskContext) — `compute` Method

`resolveTable` Method

Creating RDD for Distributed Data Scan — `scanTable` Object Method

Creating JDBCRDD Instance

`getPartitions` Method

`pruneSchema` Internal Method

Converting Filter Predicate to SQL Expression — `compileFilter` Object Method

results matching ""

No results matching ""

JDBCRDD

Computing Partition (in TaskContext) — compute Method

resolveTable Method

Creating RDD for Distributed Data Scan — scanTable Object Method

Creating JDBCRDD Instance

getPartitions Method

pruneSchema Internal Method

Converting Filter Predicate to SQL Expression — compileFilter Object Method

results matching ""

No results matching ""

Computing Partition (in TaskContext) — `compute` Method

`resolveTable` Method

Creating RDD for Distributed Data Scan — `scanTable` Object Method

`getPartitions` Method

`pruneSchema` Internal Method

Converting Filter Predicate to SQL Expression — `compileFilter` Object Method