val tableName = "h1"
// Make the example reproducible
val db = spark.catalog.currentDatabase
import spark.sharedState.{externalCatalog => extCatalog}
extCatalog.dropTable(
db, table = tableName, ignoreIfNotExists = true, purge = true)
// sql("CREATE TABLE h1 (id LONG) USING hive")
import org.apache.spark.sql.types.StructType
spark.catalog.createTable(
tableName,
source = "hive",
schema = new StructType().add($"id".long),
options = Map.empty[String, String])
val h1meta = extCatalog.getTable(db, tableName)
scala> println(h1meta.provider.get)
hive
// Looks like we've got the testing space ready for the experiment
val h1 = spark.table(tableName)
import org.apache.spark.sql.catalyst.dsl.plans._
val plan = table(tableName).insertInto("t2", overwrite = true)
scala> println(plan.numberedTreeString)
00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false
01 +- 'UnresolvedRelation `h1`
// ResolveRelations logical rule first to resolve UnresolvedRelations
import spark.sessionState.analyzer.ResolveRelations
val rrPlan = ResolveRelations(plan)
scala> println(rrPlan.numberedTreeString)
00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false
01 +- 'SubqueryAlias h1
02 +- 'UnresolvedCatalogRelation `default`.`h1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
// FindDataSourceTable logical rule next to resolve UnresolvedCatalogRelations
import org.apache.spark.sql.execution.datasources.FindDataSourceTable
val findTablesRule = new FindDataSourceTable(spark)
val planWithTables = findTablesRule(rrPlan)
// At long last...
// Note HiveTableRelation in the logical plan
scala> println(planWithTables.numberedTreeString)
00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false
01 +- SubqueryAlias h1
02 +- HiveTableRelation `default`.`h1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#13L]
HiveTableRelation Leaf Logical Operator — Representing Hive Tables in Logical Plan
HiveTableRelation
is a leaf logical operator that represents a Hive table in a logical query plan.
HiveTableRelation
is created when FindDataSourceTable
logical evaluation rule is requested to resolve UnresolvedCatalogRelations in a logical plan (for Hive tables).
Note
|
HiveTableRelation can be converted to a HadoopFsRelation based on spark.sql.hive.convertMetastoreParquet and spark.sql.hive.convertMetastoreOrc properties (and "disappears" from a logical plan when enabled).
|
HiveTableRelation
is partitioned when it has at least one partition column.
HiveTableRelation
is a MultiInstanceRelation.
HiveTableRelation
is converted (resolved) to as follows:
-
HiveTableScanExec physical operator in HiveTableScans execution planning strategy
-
InsertIntoHiveTable command in HiveAnalysis logical resolution rule
The metadata of a HiveTableRelation
(in a catalog) has to meet the requirements:
-
The database is defined
-
The partition schema is of the same type as partitionCols
-
The data schema is of the same type as dataCols
Computing Statistics — computeStats
Method
computeStats(): Statistics
Note
|
computeStats is part of LeafNode Contract to compute statistics for cost-based optimizer.
|
computeStats
takes the table statistics from the table metadata if defined and converts them to Spark statistics (with output columns).
If the table statistics are not available, computeStats
reports an IllegalStateException
.
table stats must be specified.
Creating HiveTableRelation Instance
HiveTableRelation
takes the following when created:
-
Partition columns (as a collection of
AttributeReferences
)
Partition Columns
When created, HiveTableRelation
is given the partition columns.
FindDataSourceTable logical evaluation rule creates a HiveTableRelation
based on a table specification (from a catalog).
The partition columns are exactly partitions of the table specification.
isPartitioned
Method
isPartitioned: Boolean
isPartitioned
is true
when there is at least one partition column.
Note
|
|