JDBCRDD

JDBCRDD is a RDD of internal binary rows that represents a structured query over a table in a database accessed via JDBC.

Note
JDBCRDD represents a "SELECT requiredColumns FROM table" query.

JDBCRDD is created exclusively when JDBCRDD is requested to scanTable (when JDBCRelation is requested to build a scan).

Table 1. JDBCRDD’s Internal Properties (e.g. Registries, Counters and Flags)
Name Description

columnList

Column names

Used when…​FIXME

filterWhereClause

Filters as a SQL WHERE clause

Used when…​FIXME

Computing Partition (in TaskContext) — compute Method

compute(thePart: Partition, context: TaskContext): Iterator[InternalRow]
Note
compute is part of Spark Core’s RDD Contract to compute a partition (in a TaskContext).

compute…​FIXME

resolveTable Method

resolveTable(options: JDBCOptions): StructType

resolveTable…​FIXME

Note
resolveTable is used exclusively when JDBCRelation is requested for the schema.

Creating RDD for Distributed Data Scan — scanTable Object Method

scanTable(
  sc: SparkContext,
  schema: StructType,
  requiredColumns: Array[String],
  filters: Array[Filter],
  parts: Array[Partition],
  options: JDBCOptions): RDD[InternalRow]

scanTable takes the url option.

scanTable finds the corresponding JDBC dialect (per the url option) and requests it to quote the column identifiers in the input requiredColumns.

scanTable uses the JdbcUtils object to createConnectionFactory and prune columns from the input schema to include the input requiredColumns only.

In the end, scanTable creates a new JDBCRDD.

Note
scanTable is used exclusively when JDBCRelation is requested to build a distributed data scan with column pruning and filter pushdown.

Creating JDBCRDD Instance

JDBCRDD takes the following when created:

  • SparkContext

  • Function to create a Connection (() ⇒ Connection)

  • Schema (StructType)

  • Array of column names

  • Array of Filter predicates

  • Array of Spark Core’s Partitions

  • Connection URL

  • JDBCOptions

JDBCRDD initializes the internal registries and counters.

getPartitions Method

getPartitions: Array[Partition]
Note
getPartitions is part of Spark Core’s RDD Contract to…​FIXME

getPartitions simply returns the partitions (this JDBCRDD was created with).

pruneSchema Internal Method

pruneSchema(schema: StructType, columns: Array[String]): StructType

pruneSchema…​FIXME

Note
pruneSchema is used when…​FIXME

Converting Filter Predicate to SQL Expression — compileFilter Object Method

compileFilter(f: Filter, dialect: JdbcDialect): Option[String]

compileFilter…​FIXME

Note

compileFilter is used when:

results matching ""

    No results matching ""