SessionCatalog — Session-Scoped Catalog of Relational Entities

SessionCatalog is the catalog (registry) of relational entities, i.e. databases, tables, views, partitions, and functions (in a SparkSession).

spark sql SessionCatalog.png
Figure 1. SessionCatalog and Spark SQL Services

SessionCatalog uses the ExternalCatalog for the metadata of permanent entities (i.e. tables).

Note
SessionCatalog is a layer over ExternalCatalog in a SparkSession which allows for different metastores (i.e. in-memory or hive) to be used.

SessionCatalog is available through SessionState (of a SparkSession).

scala> :type spark
org.apache.spark.sql.SparkSession

scala> :type spark.sessionState.catalog
org.apache.spark.sql.catalyst.catalog.SessionCatalog

SessionCatalog is created when BaseSessionStateBuilder is requested for the SessionCatalog (when SessionState is requested for it).

Amongst the notable usages of SessionCatalog is to create an Analyzer or a SparkOptimizer.

Table 1. SessionCatalog’s Internal Properties (e.g. Registries, Counters and Flags)
Name Description

currentDb

FIXME

Used when…​FIXME

tableRelationCache

A cache of fully-qualified table names to table relation plans (i.e. LogicalPlan).

Used when SessionCatalog refreshes a table

tempViews

Registry of temporary views (i.e. non-global temporary tables)

requireTableExists Internal Method

requireTableExists(name: TableIdentifier): Unit

requireTableExists…​FIXME

Note
requireTableExists is used when…​FIXME

databaseExists Method

databaseExists(db: String): Boolean

databaseExists…​FIXME

Note
databaseExists is used when…​FIXME

listTables Method

listTables(db: String): Seq[TableIdentifier] (1)
listTables(db: String, pattern: String): Seq[TableIdentifier]
  1. Uses "*" as the pattern

listTables…​FIXME

Note

listTables is used when:

  • ShowTablesCommand logical command is requested to run

  • SessionCatalog is requested to reset (for testing)

  • CatalogImpl is requested to listTables (for testing)

Checking Whether Table Is Temporary View — isTemporaryTable Method

isTemporaryTable(name: TableIdentifier): Boolean

isTemporaryTable…​FIXME

Note
isTemporaryTable is used when…​FIXME

alterPartitions Method

alterPartitions(tableName: TableIdentifier, parts: Seq[CatalogTablePartition]): Unit

alterPartitions…​FIXME

Note
alterPartitions is used when…​FIXME

listPartitions Method

listPartitions(
  tableName: TableIdentifier,
  partialSpec: Option[TablePartitionSpec] = None): Seq[CatalogTablePartition]

listPartitions…​FIXME

Note
listPartitions is used when…​FIXME

listPartitionsByFilter Method

listPartitionsByFilter(
  tableName: TableIdentifier,
  predicates: Seq[Expression]): Seq[CatalogTablePartition]

listPartitionsByFilter…​FIXME

Note
listPartitionsByFilter is used when…​FIXME

alterTable Method

alterTable(tableDefinition: CatalogTable): Unit

alterTable…​FIXME

Note
alterTable is used when AlterTableSetPropertiesCommand, AlterTableUnsetPropertiesCommand, AlterTableChangeColumnCommand, AlterTableSerDePropertiesCommand, AlterTableRecoverPartitionsCommand, AlterTableSetLocationCommand, AlterViewAsCommand (for permanent views) logical commands are executed.

Altering Table Statistics in Metastore (and Invalidating Internal Cache) — alterTableStats Method

alterTableStats(identifier: TableIdentifier, newStats: Option[CatalogStatistics]): Unit

alterTableStats requests ExternalCatalog to alter the statistics of the table (per identifier) followed by invalidating the table relation cache.

alterTableStats reports a NoSuchDatabaseException if the database does not exist.

alterTableStats reports a NoSuchTableException if the table does not exist.

Note

alterTableStats is used when the following logical commands are executed:

tableExists Method

tableExists(
  name: TableIdentifier): Boolean

tableExists assumes default database unless defined in the input TableIdentifier.

Note
tableExists is used when…​FIXME

functionExists Method

functionExists(name: FunctionIdentifier): Boolean

functionExists…​FIXME

Note

functionExists is used in:

listFunctions Method

listFunctions(
  db: String): Seq[(FunctionIdentifier, String)]
listFunctions(
  db: String,
  pattern: String): Seq[(FunctionIdentifier, String)]

listFunctions…​FIXME

Note
listFunctions is used when…​FIXME

Invalidating Table Relation Cache (aka Refreshing Table) — refreshTable Method

refreshTable(name: TableIdentifier): Unit

refreshTable…​FIXME

Note
refreshTable is used when…​FIXME

loadFunctionResources Method

loadFunctionResources(resources: Seq[FunctionResource]): Unit

loadFunctionResources…​FIXME

Note
loadFunctionResources is used when…​FIXME

Altering (Updating) Temporary View (Logical Plan) — alterTempViewDefinition Method

alterTempViewDefinition(name: TableIdentifier, viewDefinition: LogicalPlan): Boolean

alterTempViewDefinition alters the temporary view by updating an in-memory temporary table (when a database is not specified and the table has already been registered) or a global temporary table (when a database is specified and it is for global temporary tables).

Note
"Temporary table" and "temporary view" are synonyms.

alterTempViewDefinition returns true when an update could be executed and finished successfully.

Note
alterTempViewDefinition is used exclusively when AlterViewAsCommand logical command is executed.

Creating (Registering) Or Replacing Local Temporary View — createTempView Method

createTempView(
  name: String,
  tableDefinition: LogicalPlan,
  overrideIfExists: Boolean): Unit

createTempView…​FIXME

Note
createTempView is used when…​FIXME

Creating (Registering) Or Replacing Global Temporary View — createGlobalTempView Method

createGlobalTempView(
  name: String,
  viewDefinition: LogicalPlan,
  overrideIfExists: Boolean): Unit

createGlobalTempView simply requests the GlobalTempViewManager to register a global temporary view.

Note

createGlobalTempView is used when:

createTable Method

createTable(tableDefinition: CatalogTable, ignoreIfExists: Boolean): Unit

createTable…​FIXME

Note
createTable is used when…​FIXME

Creating SessionCatalog Instance

SessionCatalog takes the following when created:

SessionCatalog initializes the internal registries and counters.

Finding Function by Name (Using FunctionRegistry) — lookupFunction Method

lookupFunction(
  name: FunctionIdentifier,
  children: Seq[Expression]): Expression

lookupFunction finds a function by name.

For a function with no database defined that exists in FunctionRegistry, lookupFunction requests FunctionRegistry to find the function (by its unqualified name, i.e. with no database).

If the name function has the database defined or does not exist in FunctionRegistry, lookupFunction uses the fully-qualified function name to check if the function exists in FunctionRegistry (by its fully-qualified name, i.e. with a database).

For other cases, lookupFunction requests ExternalCatalog to find the function and loads its resources. It then creates a corresponding temporary function and looks up the function again.

Note

lookupFunction is used when:

Finding Relation (Table or View) in Catalogs — lookupRelation Method

lookupRelation(name: TableIdentifier): LogicalPlan

lookupRelation finds the name table in the catalogs (i.e. GlobalTempViewManager, ExternalCatalog or registry of temporary views) and gives a SubqueryAlias per table type.

scala> :type spark.sessionState.catalog
org.apache.spark.sql.catalyst.catalog.SessionCatalog

import spark.sessionState.{catalog => c}
import org.apache.spark.sql.catalyst.TableIdentifier

// Global temp view
val db = spark.sharedState.globalTempViewManager.database
// Make the example reproducible (and so "replace")
spark.range(1).createOrReplaceGlobalTempView("gv1")
val gv1 = TableIdentifier(table = "gv1", database = Some(db))
val plan = c.lookupRelation(gv1)
scala> println(plan.numberedTreeString)
00 SubqueryAlias gv1
01 +- Range (0, 1, step=1, splits=Some(8))

val metastore = spark.sharedState.externalCatalog

// Regular table
val db = spark.catalog.currentDatabase
metastore.dropTable(db, table = "t1", ignoreIfNotExists = true, purge = true)
sql("CREATE TABLE t1 (id LONG) USING parquet")
val t1 = TableIdentifier(table = "t1", database = Some(db))
val plan = c.lookupRelation(t1)
scala> println(plan.numberedTreeString)
00 'SubqueryAlias t1
01 +- 'UnresolvedCatalogRelation `default`.`t1`, org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe

// Regular view (not temporary view!)
// Make the example reproducible
metastore.dropTable(db, table = "v1", ignoreIfNotExists = true, purge = true)
import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, CatalogTable, CatalogTableType}
val v1 = TableIdentifier(table = "v1", database = Some(db))
import org.apache.spark.sql.types.StructType
val schema = new StructType().add($"id".long)
val storage = CatalogStorageFormat(locationUri = None, inputFormat = None, outputFormat = None, serde = None, compressed = false, properties = Map())
val tableDef = CatalogTable(
  identifier = v1,
  tableType = CatalogTableType.VIEW,
  storage,
  schema,
  viewText = Some("SELECT 1") /** Required or RuntimeException reported */)
metastore.createTable(tableDef, ignoreIfExists = false)
val plan = c.lookupRelation(v1)
scala> println(plan.numberedTreeString)
00 'SubqueryAlias v1
01 +- View (`default`.`v1`, [id#77L])
02    +- 'Project [unresolvedalias(1, None)]
03       +- OneRowRelation

// Temporary view
spark.range(1).createOrReplaceTempView("v2")
val v2 = TableIdentifier(table = "v2", database = None)
val plan = c.lookupRelation(v2)
scala> println(plan.numberedTreeString)
00 SubqueryAlias v2
01 +- Range (0, 1, step=1, splits=Some(8))

Internally, lookupRelation looks up the name table using:

  1. GlobalTempViewManager when the database name of the table matches the name of GlobalTempViewManager

    1. Gives SubqueryAlias or reports a NoSuchTableException

  2. ExternalCatalog when the database name of the table is specified explicitly or the registry of temporary views does not contain the table

    1. Gives SubqueryAlias with View when the table is a view (aka temporary table)

    2. Gives SubqueryAlias with UnresolvedCatalogRelation otherwise

  3. The registry of temporary views

    1. Gives SubqueryAlias with the logical plan per the table as registered in the registry of temporary views

Note
lookupRelation considers default to be the name of the database if the name table does not specify the database explicitly.
Note

lookupRelation is used when:

Retrieving Table Metadata from External Catalog (Metastore) — getTableMetadata Method

getTableMetadata(name: TableIdentifier): CatalogTable

getTableMetadata simply requests external catalog (metastore) for the table metadata.

Before requesting the external metastore, getTableMetadata makes sure that the database and table (of the input TableIdentifier) both exist. If either does not exist, getTableMetadata reports a NoSuchDatabaseException or NoSuchTableException, respectively.

Retrieving Table Metadata — getTempViewOrPermanentTableMetadata Method

getTempViewOrPermanentTableMetadata(name: TableIdentifier): CatalogTable

Internally, getTempViewOrPermanentTableMetadata branches off per database.

When a database name is not specified, getTempViewOrPermanentTableMetadata finds a local temporary view and creates a CatalogTable (with VIEW table type and an undefined storage) or retrieves the table metadata from an external catalog.

With the database name of the GlobalTempViewManager, getTempViewOrPermanentTableMetadata requests GlobalTempViewManager for the global view definition and creates a CatalogTable (with the name of GlobalTempViewManager in table identifier, VIEW table type and an undefined storage) or reports a NoSuchTableException.

With the database name not of GlobalTempViewManager, getTempViewOrPermanentTableMetadata simply retrieves the table metadata from an external catalog.

Note

getTempViewOrPermanentTableMetadata is used when:

Reporting NoSuchDatabaseException When Specified Database Does Not Exist — requireDbExists Internal Method

requireDbExists(db: String): Unit

requireDbExists reports a NoSuchDatabaseException if the specified database does not exist. Otherwise, requireDbExists does nothing.

reset Method

reset(): Unit

reset…​FIXME

Note
reset is used exclusively in the Spark SQL internal tests.

Dropping Global Temporary View — dropGlobalTempView Method

dropGlobalTempView(name: String): Boolean

dropGlobalTempView simply requests the GlobalTempViewManager to remove the name global temporary view.

Note
dropGlobalTempView is used when…​FIXME

Dropping Table — dropTable Method

dropTable(
  name: TableIdentifier,
  ignoreIfNotExists: Boolean,
  purge: Boolean): Unit

dropTable…​FIXME

Note

dropTable is used when:

Looking Up Global Temporary View by Name — getGlobalTempView Method

getGlobalTempView(
  name: String): Option[LogicalPlan]

getGlobalTempView requests the GlobalTempViewManager for the temporary view definition by the input name.

Note
getGlobalTempView is used when CatalogImpl is requested to dropGlobalTempView.

registerFunction Method

registerFunction(
  funcDefinition: CatalogFunction,
  overrideIfExists: Boolean,
  functionBuilder: Option[FunctionBuilder] = None): Unit

registerFunction…​FIXME

Note

registerFunction is used when:

  • SessionCatalog is requested to lookupFunction

  • HiveSessionCatalog is requested to lookupFunction0

  • CreateFunctionCommand logical command is executed

lookupFunctionInfo Method

lookupFunctionInfo(name: FunctionIdentifier): ExpressionInfo

lookupFunctionInfo…​FIXME

Note
lookupFunctionInfo is used when…​FIXME

alterTableDataSchema Method

alterTableDataSchema(
  identifier: TableIdentifier,
  newDataSchema: StructType): Unit

alterTableDataSchema…​FIXME

Note
alterTableDataSchema is used when…​FIXME

getCachedTable Method

getCachedTable(
  key: QualifiedTableName): LogicalPlan

getCachedTable…​FIXME

Note
getCachedTable is used when…​FIXME

results matching ""

    No results matching ""