JdbcUtils Helper Object

JdbcUtils is a Scala object with methods to support JDBCRDD, JDBCRelation and JdbcRelationProvider.

Table 1. JdbcUtils API
Name	Description
createConnectionFactory	Used when: `JDBCRDD` is requested to scanTable and resolveTable `JdbcRelationProvider` is requested to write the rows of a structured query (a DataFrame) to a table
createTable
dropTable
getCommonJDBCType
getCustomSchema	Replaces data types in a table schema Used exclusively when `JDBCRelation` is created (and customSchema JDBC option was defined)
getInsertStatement
getSchema	Used when `JDBCRDD` is requested to resolveTable
getSchemaOption	Used when `JdbcRelationProvider` is requested to write the rows of a structured query (a DataFrame) to a table
resultSetToRows	Used when…FIXME
resultSetToSparkInternalRows	Used when `JDBCRDD` is requested to compute a partition
schemaString
saveTable
tableExists	Used when `JdbcRelationProvider` is requested to write the rows of a structured query (a DataFrame) to a table
truncateTable	Used when…FIXME

`createConnectionFactory` Method

createConnectionFactory(options: JDBCOptions): () => Connection

createConnectionFactory…FIXME

Note	`createConnectionFactory` is used when: `JDBCRDD` is requested to scanTable (and in turn creates a JDBCRDD) and resolveTable `JdbcRelationProvider` is requested to create a BaseRelation `JdbcUtils` is requested to saveTable

`getCommonJDBCType` Method

getCommonJDBCType(dt: DataType): Option[JdbcType]

getCommonJDBCType…FIXME

Note	`getCommonJDBCType` is used when…FIXME

`getCatalystType` Internal Method

getCatalystType(
  sqlType: Int,
  precision: Int,
  scale: Int,
  signed: Boolean): DataType

getCatalystType…FIXME

Note	`getCatalystType` is used when…FIXME

`getSchemaOption` Method

getSchemaOption(conn: Connection, options: JDBCOptions): Option[StructType]

getSchemaOption…FIXME

Note	`getSchemaOption` is used when…FIXME

`getSchema` Method

getSchema(
  resultSet: ResultSet,
  dialect: JdbcDialect,
  alwaysNullable: Boolean = false): StructType

getSchema…FIXME

Note	`getSchema` is used when…FIXME

`resultSetToRows` Method

resultSetToRows(resultSet: ResultSet, schema: StructType): Iterator[Row]

resultSetToRows…FIXME

Note	`resultSetToRows` is used when…FIXME

`resultSetToSparkInternalRows` Method

resultSetToSparkInternalRows(
  resultSet: ResultSet,
  schema: StructType,
  inputMetrics: InputMetrics): Iterator[InternalRow]

resultSetToSparkInternalRows…FIXME

Note	`resultSetToSparkInternalRows` is used when…FIXME

`schemaString` Method

schemaString(
  df: DataFrame,
  url: String,
  createTableColumnTypes: Option[String] = None): String

schemaString…FIXME

Note	`schemaString` is used exclusively when `JdbcUtils` is requested to create a table.

`parseUserSpecifiedCreateTableColumnTypes` Internal Method

parseUserSpecifiedCreateTableColumnTypes(
  df: DataFrame,
  createTableColumnTypes: String): Map[String, String]

parseUserSpecifiedCreateTableColumnTypes…FIXME

Note	`parseUserSpecifiedCreateTableColumnTypes` is used exclusively when `JdbcUtils` is requested to schemaString.

`saveTable` Method

saveTable(
  df: DataFrame,
  tableSchema: Option[StructType],
  isCaseSensitive: Boolean,
  options: JDBCOptions): Unit

saveTable takes the url, table, batchSize, isolationLevel options and createConnectionFactory.

saveTable getInsertStatement.

saveTable takes the numPartitions option and applies coalesce operator to the input DataFrame if the number of partitions of its RDD is less than the numPartitions option.

In the end, saveTable requests the possibly-repartitioned DataFrame for its RDD (it may have changed after the coalesce operator) and executes savePartition for every partition (using RDD.foreachPartition).

Note	`saveTable` is used exclusively when `JdbcRelationProvider` is requested to write the rows of a structured query (a DataFrame) to a table.

Replacing Data Types In Table Schema — `getCustomSchema` Method

getCustomSchema(
  tableSchema: StructType,
  customSchema: String,
  nameEquality: Resolver): StructType

getCustomSchema replaces the data type of the fields in the input tableSchema schema that are included in the input customSchema (if defined).

Internally, getCustomSchema branches off per the input customSchema.

If the input customSchema is undefined or empty, getCustomSchema simply returns the input tableSchema unchanged.

Otherwise, if the input customSchema is not empty, getCustomSchema requests CatalystSqlParser to parse it (i.e. create a new StructType for the given customSchema canonical schema representation).

getCustomSchema then uses SchemaUtils to checkColumnNameDuplication (in the column names of the user-defined customSchema schema with the input nameEquality).

In the end, getCustomSchema replaces the data type of the fields in the input tableSchema that are included in the input userSchema.

Note	`getCustomSchema` is used exclusively when `JDBCRelation` is created (and customSchema JDBC option was defined).

`dropTable` Method

dropTable(conn: Connection, table: String): Unit

dropTable…FIXME

Note	`dropTable` is used when…FIXME

Creating Table Using JDBC — `createTable` Method

createTable(
  conn: Connection,
  df: DataFrame,
  options: JDBCOptions): Unit

createTable builds the table schema (given the input DataFrame with the url and createTableColumnTypes options).

createTable uses the table and createTableOptions options.

In the end, createTable concatenates all the above texts into a CREATE TABLE [table] ([strSchema]) [createTableOptions] SQL DDL statement followed by executing it (using the input JDBC Connection).

Note	`createTable` is used exclusively when `JdbcRelationProvider` is requested to write the rows of a structured query (a DataFrame) to a table.

`getInsertStatement` Method

getInsertStatement(
  table: String,
  rddSchema: StructType,
  tableSchema: Option[StructType],
  isCaseSensitive: Boolean,
  dialect: JdbcDialect): String

getInsertStatement…FIXME

Note	`getInsertStatement` is used when…FIXME

`getJdbcType` Internal Method

getJdbcType(dt: DataType, dialect: JdbcDialect): JdbcType

getJdbcType…FIXME

Note	`getJdbcType` is used when…FIXME

`tableExists` Method

tableExists(conn: Connection, options: JDBCOptions): Boolean

tableExists…FIXME

Note	`tableExists` is used exclusively when `JdbcRelationProvider` is requested to write the rows of a structured query (a DataFrame) to a table.

`truncateTable` Method

truncateTable(conn: Connection, options: JDBCOptions): Unit

truncateTable…FIXME

Note	`truncateTable` is used exclusively when `JdbcRelationProvider` is requested to write the rows of a structured query (a DataFrame) to a table.

Saving Rows (Per Partition) to Table — `savePartition` Method

savePartition(
  getConnection: () => Connection,
  table: String,
  iterator: Iterator[Row],
  rddSchema: StructType,
  insertStmt: String,
  batchSize: Int,
  dialect: JdbcDialect,
  isolationLevel: Int): Iterator[Byte]

savePartition creates a JDBC Connection using the input getConnection function.

savePartition tries to set the input isolationLevel if it is different than TRANSACTION_NONE and the database supports transactions.

savePartition then writes rows (in the input Iterator[Row]) using batches that are submitted after batchSize rows where added.

Note	`savePartition` is used exclusively when `JdbcUtils` is requested to saveTable.

JdbcUtils Helper Object