SessionCatalog — Session-Scoped Catalog of Relational Entities
SessionCatalog
is the catalog (registry) of relational entities, i.e. databases, tables, views, partitions, and functions (in a SparkSession).
SessionCatalog
uses the ExternalCatalog for the metadata of permanent entities (i.e. tables).
Note
|
SessionCatalog is a layer over ExternalCatalog in a SparkSession which allows for different metastores (i.e. in-memory or hive) to be used.
|
SessionCatalog
is available through SessionState (of a SparkSession).
scala> :type spark
org.apache.spark.sql.SparkSession
scala> :type spark.sessionState.catalog
org.apache.spark.sql.catalyst.catalog.SessionCatalog
SessionCatalog
is created when BaseSessionStateBuilder
is requested for the SessionCatalog (when SessionState
is requested for it).
Amongst the notable usages of SessionCatalog
is to create an Analyzer or a SparkOptimizer.
Name | Description |
---|---|
|
Used when…FIXME |
|
A cache of fully-qualified table names to table relation plans (i.e. Used when |
|
Registry of temporary views (i.e. non-global temporary tables) |
requireTableExists
Internal Method
requireTableExists(name: TableIdentifier): Unit
requireTableExists
…FIXME
Note
|
requireTableExists is used when…FIXME
|
databaseExists
Method
databaseExists(db: String): Boolean
databaseExists
…FIXME
Note
|
databaseExists is used when…FIXME
|
listTables
Method
listTables(db: String): Seq[TableIdentifier] (1)
listTables(db: String, pattern: String): Seq[TableIdentifier]
-
Uses
"*"
as the pattern
listTables
…FIXME
Note
|
|
Checking Whether Table Is Temporary View — isTemporaryTable
Method
isTemporaryTable(name: TableIdentifier): Boolean
isTemporaryTable
…FIXME
Note
|
isTemporaryTable is used when…FIXME
|
alterPartitions
Method
alterPartitions(tableName: TableIdentifier, parts: Seq[CatalogTablePartition]): Unit
alterPartitions
…FIXME
Note
|
alterPartitions is used when…FIXME
|
listPartitions
Method
listPartitions(
tableName: TableIdentifier,
partialSpec: Option[TablePartitionSpec] = None): Seq[CatalogTablePartition]
listPartitions
…FIXME
Note
|
listPartitions is used when…FIXME
|
listPartitionsByFilter
Method
listPartitionsByFilter(
tableName: TableIdentifier,
predicates: Seq[Expression]): Seq[CatalogTablePartition]
listPartitionsByFilter
…FIXME
Note
|
listPartitionsByFilter is used when…FIXME
|
alterTable
Method
alterTable(tableDefinition: CatalogTable): Unit
alterTable
…FIXME
Note
|
alterTable is used when AlterTableSetPropertiesCommand , AlterTableUnsetPropertiesCommand , AlterTableChangeColumnCommand , AlterTableSerDePropertiesCommand , AlterTableRecoverPartitionsCommand , AlterTableSetLocationCommand , AlterViewAsCommand (for permanent views) logical commands are executed.
|
Altering Table Statistics in Metastore (and Invalidating Internal Cache) — alterTableStats
Method
alterTableStats(identifier: TableIdentifier, newStats: Option[CatalogStatistics]): Unit
alterTableStats
requests ExternalCatalog to alter the statistics of the table (per identifier
) followed by invalidating the table relation cache.
alterTableStats
reports a NoSuchDatabaseException
if the database does not exist.
alterTableStats
reports a NoSuchTableException
if the table does not exist.
Note
|
|
tableExists
Method
tableExists(
name: TableIdentifier): Boolean
tableExists
requests the ExternalCatalog to check out whether the table exists or not.
tableExists
assumes default database unless defined in the input TableIdentifier
.
Note
|
tableExists is used when…FIXME
|
functionExists
Method
functionExists(name: FunctionIdentifier): Boolean
functionExists
…FIXME
Note
|
|
listFunctions
Method
listFunctions(
db: String): Seq[(FunctionIdentifier, String)]
listFunctions(
db: String,
pattern: String): Seq[(FunctionIdentifier, String)]
listFunctions
…FIXME
Note
|
listFunctions is used when…FIXME
|
Invalidating Table Relation Cache (aka Refreshing Table) — refreshTable
Method
refreshTable(name: TableIdentifier): Unit
refreshTable
…FIXME
Note
|
refreshTable is used when…FIXME
|
loadFunctionResources
Method
loadFunctionResources(resources: Seq[FunctionResource]): Unit
loadFunctionResources
…FIXME
Note
|
loadFunctionResources is used when…FIXME
|
Altering (Updating) Temporary View (Logical Plan) — alterTempViewDefinition
Method
alterTempViewDefinition(name: TableIdentifier, viewDefinition: LogicalPlan): Boolean
alterTempViewDefinition
alters the temporary view by updating an in-memory temporary table (when a database is not specified and the table has already been registered) or a global temporary table (when a database is specified and it is for global temporary tables).
Note
|
"Temporary table" and "temporary view" are synonyms. |
alterTempViewDefinition
returns true
when an update could be executed and finished successfully.
Note
|
alterTempViewDefinition is used exclusively when AlterViewAsCommand logical command is executed.
|
Creating (Registering) Or Replacing Local Temporary View — createTempView
Method
createTempView(
name: String,
tableDefinition: LogicalPlan,
overrideIfExists: Boolean): Unit
createTempView
…FIXME
Note
|
createTempView is used when…FIXME
|
Creating (Registering) Or Replacing Global Temporary View — createGlobalTempView
Method
createGlobalTempView(
name: String,
viewDefinition: LogicalPlan,
overrideIfExists: Boolean): Unit
createGlobalTempView
simply requests the GlobalTempViewManager to register a global temporary view.
Note
|
|
createTable
Method
createTable(tableDefinition: CatalogTable, ignoreIfExists: Boolean): Unit
createTable
…FIXME
Note
|
createTable is used when…FIXME
|
Creating SessionCatalog Instance
SessionCatalog
takes the following when created:
-
Hadoop’s Configuration
SessionCatalog
initializes the internal registries and counters.
Finding Function by Name (Using FunctionRegistry) — lookupFunction
Method
lookupFunction(
name: FunctionIdentifier,
children: Seq[Expression]): Expression
lookupFunction
finds a function by name
.
For a function with no database defined that exists in FunctionRegistry, lookupFunction
requests FunctionRegistry
to find the function (by its unqualified name, i.e. with no database).
If the name
function has the database defined or does not exist in FunctionRegistry
, lookupFunction
uses the fully-qualified function name
to check if the function exists in FunctionRegistry (by its fully-qualified name, i.e. with a database).
For other cases, lookupFunction
requests ExternalCatalog to find the function and loads its resources. It then creates a corresponding temporary function and looks up the function again.
Note
|
|
Finding Relation (Table or View) in Catalogs — lookupRelation
Method
lookupRelation(name: TableIdentifier): LogicalPlan
lookupRelation
finds the name
table in the catalogs (i.e. GlobalTempViewManager, ExternalCatalog or registry of temporary views) and gives a SubqueryAlias
per table type.
scala> :type spark.sessionState.catalog
org.apache.spark.sql.catalyst.catalog.SessionCatalog
import spark.sessionState.{catalog => c}
import org.apache.spark.sql.catalyst.TableIdentifier
// Global temp view
val db = spark.sharedState.globalTempViewManager.database
// Make the example reproducible (and so "replace")
spark.range(1).createOrReplaceGlobalTempView("gv1")
val gv1 = TableIdentifier(table = "gv1", database = Some(db))
val plan = c.lookupRelation(gv1)
scala> println(plan.numberedTreeString)
00 SubqueryAlias gv1
01 +- Range (0, 1, step=1, splits=Some(8))
val metastore = spark.sharedState.externalCatalog
// Regular table
val db = spark.catalog.currentDatabase
metastore.dropTable(db, table = "t1", ignoreIfNotExists = true, purge = true)
sql("CREATE TABLE t1 (id LONG) USING parquet")
val t1 = TableIdentifier(table = "t1", database = Some(db))
val plan = c.lookupRelation(t1)
scala> println(plan.numberedTreeString)
00 'SubqueryAlias t1
01 +- 'UnresolvedCatalogRelation `default`.`t1`, org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
// Regular view (not temporary view!)
// Make the example reproducible
metastore.dropTable(db, table = "v1", ignoreIfNotExists = true, purge = true)
import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, CatalogTable, CatalogTableType}
val v1 = TableIdentifier(table = "v1", database = Some(db))
import org.apache.spark.sql.types.StructType
val schema = new StructType().add($"id".long)
val storage = CatalogStorageFormat(locationUri = None, inputFormat = None, outputFormat = None, serde = None, compressed = false, properties = Map())
val tableDef = CatalogTable(
identifier = v1,
tableType = CatalogTableType.VIEW,
storage,
schema,
viewText = Some("SELECT 1") /** Required or RuntimeException reported */)
metastore.createTable(tableDef, ignoreIfExists = false)
val plan = c.lookupRelation(v1)
scala> println(plan.numberedTreeString)
00 'SubqueryAlias v1
01 +- View (`default`.`v1`, [id#77L])
02 +- 'Project [unresolvedalias(1, None)]
03 +- OneRowRelation
// Temporary view
spark.range(1).createOrReplaceTempView("v2")
val v2 = TableIdentifier(table = "v2", database = None)
val plan = c.lookupRelation(v2)
scala> println(plan.numberedTreeString)
00 SubqueryAlias v2
01 +- Range (0, 1, step=1, splits=Some(8))
Internally, lookupRelation
looks up the name
table using:
-
GlobalTempViewManager when the database name of the table matches the name of
GlobalTempViewManager
-
Gives
SubqueryAlias
or reports aNoSuchTableException
-
-
ExternalCatalog when the database name of the table is specified explicitly or the registry of temporary views does not contain the table
-
Gives
SubqueryAlias
withView
when the table is a view (aka temporary table) -
Gives
SubqueryAlias
withUnresolvedCatalogRelation
otherwise
-
-
The registry of temporary views
-
Gives
SubqueryAlias
with the logical plan per the table as registered in the registry of temporary views
-
Note
|
lookupRelation considers default to be the name of the database if the name table does not specify the database explicitly.
|
Note
|
|
Retrieving Table Metadata from External Catalog (Metastore) — getTableMetadata
Method
getTableMetadata(name: TableIdentifier): CatalogTable
getTableMetadata
simply requests external catalog (metastore) for the table metadata.
Retrieving Table Metadata — getTempViewOrPermanentTableMetadata
Method
getTempViewOrPermanentTableMetadata(name: TableIdentifier): CatalogTable
Internally, getTempViewOrPermanentTableMetadata
branches off per database.
When a database name is not specified, getTempViewOrPermanentTableMetadata
finds a local temporary view and creates a CatalogTable (with VIEW
table type and an undefined storage) or retrieves the table metadata from an external catalog.
With the database name of the GlobalTempViewManager, getTempViewOrPermanentTableMetadata
requests GlobalTempViewManager for the global view definition and creates a CatalogTable (with the name of GlobalTempViewManager
in table identifier, VIEW
table type and an undefined storage) or reports a NoSuchTableException
.
With the database name not of GlobalTempViewManager
, getTempViewOrPermanentTableMetadata
simply retrieves the table metadata from an external catalog.
Note
|
|
Reporting NoSuchDatabaseException When Specified Database Does Not Exist — requireDbExists
Internal Method
requireDbExists(db: String): Unit
requireDbExists
reports a NoSuchDatabaseException
if the specified database does not exist. Otherwise, requireDbExists
does nothing.
reset
Method
reset(): Unit
reset
…FIXME
Note
|
reset is used exclusively in the Spark SQL internal tests.
|
Dropping Global Temporary View — dropGlobalTempView
Method
dropGlobalTempView(name: String): Boolean
dropGlobalTempView
simply requests the GlobalTempViewManager to remove the name
global temporary view.
Note
|
dropGlobalTempView is used when…FIXME
|
Dropping Table — dropTable
Method
dropTable(
name: TableIdentifier,
ignoreIfNotExists: Boolean,
purge: Boolean): Unit
dropTable
…FIXME
Note
|
|
Looking Up Global Temporary View by Name — getGlobalTempView
Method
getGlobalTempView(
name: String): Option[LogicalPlan]
getGlobalTempView
requests the GlobalTempViewManager for the temporary view definition by the input name.
Note
|
getGlobalTempView is used when CatalogImpl is requested to dropGlobalTempView.
|
registerFunction
Method
registerFunction(
funcDefinition: CatalogFunction,
overrideIfExists: Boolean,
functionBuilder: Option[FunctionBuilder] = None): Unit
registerFunction
…FIXME
Note
|
|
lookupFunctionInfo
Method
lookupFunctionInfo(name: FunctionIdentifier): ExpressionInfo
lookupFunctionInfo
…FIXME
Note
|
lookupFunctionInfo is used when…FIXME
|