FindDataSourceTable Logical Evaluation Rule for Resolving UnresolvedCatalogRelations

FindDataSourceTable is a Catalyst rule that the default and Hive-specific logical query plan analyzers use for resolving UnresolvedCatalogRelations in a logical plan for the following cases:

  • InsertIntoTables with UnresolvedCatalogRelation (for datasource and hive tables)

  • "Standalone" UnresolvedCatalogRelations

Note
UnresolvedCatalogRelation leaf logical operator is a placeholder that ResolveRelations logical evaluation rule adds to a logical plan while resolving UnresolvedRelations leaf logical operators.

FindDataSourceTable is part of additional rules in Resolution fixed-point batch of rules.

scala> :type spark
org.apache.spark.sql.SparkSession

// Example: InsertIntoTable with UnresolvedCatalogRelation
// Drop tables to make the example reproducible
val db = spark.catalog.currentDatabase
Seq("t1", "t2").foreach { t =>
  spark.sharedState.externalCatalog.dropTable(db, t, ignoreIfNotExists = true, purge = true)
}

// Create tables
sql("CREATE TABLE t1 (id LONG) USING parquet")
sql("CREATE TABLE t2 (id LONG) USING orc")

import org.apache.spark.sql.catalyst.dsl.plans._
val plan = table("t1").insertInto(tableName = "t2", overwrite = true)
scala> println(plan.numberedTreeString)
00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false
01 +- 'UnresolvedRelation `t1`

// Transform the logical plan with ResolveRelations logical rule first
// so UnresolvedRelations become UnresolvedCatalogRelations
import spark.sessionState.analyzer.ResolveRelations
val planWithUnresolvedCatalogRelations = ResolveRelations(plan)
scala> println(planWithUnresolvedCatalogRelations.numberedTreeString)
00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false
01 +- 'SubqueryAlias t1
02    +- 'UnresolvedCatalogRelation `default`.`t1`, org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe

// Let's resolve UnresolvedCatalogRelations then
import org.apache.spark.sql.execution.datasources.FindDataSourceTable
val r = new FindDataSourceTable(spark)
val tablesResolvedPlan = r(planWithUnresolvedCatalogRelations)
// FIXME Why is t2 not resolved?!
scala> println(tablesResolvedPlan.numberedTreeString)
00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false
01 +- SubqueryAlias t1
02    +- Relation[id#10L] parquet

Applying FindDataSourceTable Rule to Logical Plan (and Resolving UnresolvedCatalogRelations in Logical Plan) — apply Method

apply(plan: LogicalPlan): LogicalPlan
Note
apply is part of Rule Contract to apply a rule to a logical plan.

apply…​FIXME

readHiveTable Internal Method

readHiveTable(table: CatalogTable): LogicalPlan

readHiveTable simply creates a HiveTableRelation for the input CatalogTable.

Note
readHiveTable is used when FindDataSourceTable is requested to resolving UnresolvedCatalogRelations in a logical plan.

readDataSourceTable Internal Method

readDataSourceTable(table: CatalogTable): LogicalPlan

readDataSourceTable…​FIXME

Note
readDataSourceTable is used exclusively when FindDataSourceTable logical evaluation rule is executed (for data source tables).

results matching ""

    No results matching ""