ExternalCatalog Contract — External Catalog (Metastore) of Permanent Relational Entities

ExternalCatalog is the contract of an external system catalog (aka metadata registry or metastore) of permanent relational entities, i.e. databases, tables, partitions, and functions.

Table 1. ExternalCatalog’s Features per Relational Entity
Feature Database Function Partition Table

Alter

alterDatabase

alterFunction

alterPartitions

alterTable, alterTableDataSchema, alterTableStats

Create

createDatabase

createFunction

createPartitions

createTable

Drop

dropDatabase

dropFunction

dropPartitions

dropTable

Get

getDatabase

getFunction

getPartition, getPartitionOption

getTable

List

listDatabases

listFunctions

listPartitionNames, listPartitions, listPartitionsByFilter

listTables

Load

loadDynamicPartitions, loadPartition

loadTable

Rename

renameFunction

renamePartitions

renameTable

Check Existence

databaseExists

functionExists

tableExists

Set

setCurrentDatabase

Table 2. ExternalCatalog Contract (incl. Protected Methods)
Method Description

alterPartitions

alterPartitions(
  db: String,
  table: String,
  parts: Seq[CatalogTablePartition]): Unit

createPartitions

createPartitions(
  db: String,
  table: String,
  parts: Seq[CatalogTablePartition],
  ignoreIfExists: Boolean): Unit

databaseExists

databaseExists(db: String): Boolean

doAlterDatabase

doAlterDatabase(dbDefinition: CatalogDatabase): Unit

doAlterFunction

doAlterFunction(db: String, funcDefinition: CatalogFunction): Unit

doAlterTable

doAlterTable(tableDefinition: CatalogTable): Unit

doAlterTableDataSchema

doAlterTableDataSchema(db: String, table: String, newDataSchema: StructType): Unit

doAlterTableStats

doAlterTableStats(db: String, table: String, stats: Option[CatalogStatistics]): Unit

doCreateDatabase

doCreateDatabase(dbDefinition: CatalogDatabase, ignoreIfExists: Boolean): Unit

doCreateFunction

doCreateFunction(db: String, funcDefinition: CatalogFunction): Unit

doCreateTable

doCreateTable(tableDefinition: CatalogTable, ignoreIfExists: Boolean): Unit

doDropDatabase

doDropDatabase(db: String, ignoreIfNotExists: Boolean, cascade: Boolean): Unit

doDropFunction

doDropFunction(db: String, funcName: String): Unit

doDropTable

doDropTable(
  db: String,
  table: String,
  ignoreIfNotExists: Boolean,
  purge: Boolean): Unit

doRenameFunction

doRenameFunction(db: String, oldName: String, newName: String): Unit

doRenameTable

doRenameTable(db: String, oldName: String, newName: String): Unit

dropPartitions

dropPartitions(
  db: String,
  table: String,
  parts: Seq[TablePartitionSpec],
  ignoreIfNotExists: Boolean,
  purge: Boolean,
  retainData: Boolean): Unit

functionExists

functionExists(db: String, funcName: String): Boolean

getDatabase

getDatabase(db: String): CatalogDatabase

getFunction

getFunction(db: String, funcName: String): CatalogFunction

getPartition

getPartition(db: String, table: String, spec: TablePartitionSpec): CatalogTablePartition

getPartitionOption

getPartitionOption(
  db: String,
  table: String,
  spec: TablePartitionSpec): Option[CatalogTablePartition]

getTable

getTable(db: String, table: String): CatalogTable

listDatabases

listDatabases(): Seq[String]
listDatabases(pattern: String): Seq[String]

listFunctions

listFunctions(db: String, pattern: String): Seq[String]

listPartitionNames

listPartitionNames(
  db: String,
  table: String,
  partialSpec: Option[TablePartitionSpec] = None): Seq[String]

listPartitions

listPartitions(
  db: String,
  table: String,
  partialSpec: Option[TablePartitionSpec] = None): Seq[CatalogTablePartition]

listPartitionsByFilter

listPartitionsByFilter(
  db: String,
  table: String,
  predicates: Seq[Expression],
  defaultTimeZoneId: String): Seq[CatalogTablePartition]

listTables

listTables(db: String): Seq[String]
listTables(db: String, pattern: String): Seq[String]

loadDynamicPartitions

loadDynamicPartitions(
  db: String,
  table: String,
  loadPath: String,
  partition: TablePartitionSpec,
  replace: Boolean,
  numDP: Int): Unit

loadPartition

loadPartition(
  db: String,
  table: String,
  loadPath: String,
  partition: TablePartitionSpec,
  isOverwrite: Boolean,
  inheritTableSpecs: Boolean,
  isSrcLocal: Boolean): Unit

loadTable

loadTable(
  db: String,
  table: String,
  loadPath: String,
  isOverwrite: Boolean,
  isSrcLocal: Boolean): Unit

renamePartitions

renamePartitions(
  db: String,
  table: String,
  specs: Seq[TablePartitionSpec],
  newSpecs: Seq[TablePartitionSpec]): Unit

setCurrentDatabase

setCurrentDatabase(db: String): Unit

tableExists

tableExists(db: String, table: String): Boolean

ExternalCatalog is available as externalCatalog of SharedState (in SparkSession).

scala> :type spark
org.apache.spark.sql.SparkSession

scala> :type spark.sharedState.externalCatalog
org.apache.spark.sql.catalyst.catalog.ExternalCatalog

ExternalCatalog is available as ephemeral in-memory or persistent hive-aware.

Table 3. ExternalCatalogs
ExternalCatalog Alias Description

HiveExternalCatalog

hive

A persistent system catalog using a Hive metastore.

InMemoryCatalog

in-memory

An in-memory (ephemeral) system catalog that does not require setting up external systems (like a Hive metastore).

It is intended for testing or exploration purposes only and therefore should not be used in production.

The concrete ExternalCatalog is chosen using Builder.enableHiveSupport that enables the Hive support (and sets spark.sql.catalogImplementation configuration property to hive when the Hive classes are available).

import org.apache.spark.sql.internal.StaticSQLConf
val catalogType = spark.conf.get(StaticSQLConf.CATALOG_IMPLEMENTATION.key)
scala> println(catalogType)
hive

scala> spark.sessionState.conf.getConf(StaticSQLConf.CATALOG_IMPLEMENTATION)
res1: String = hive
Tip

Set spark.sql.catalogImplementation to in-memory when starting spark-shell to use InMemoryCatalog external catalog.

// spark-shell --conf spark.sql.catalogImplementation=in-memory

import org.apache.spark.sql.internal.StaticSQLConf
scala> spark.sessionState.conf.getConf(StaticSQLConf.CATALOG_IMPLEMENTATION)
res0: String = in-memory
Important

You cannot change ExternalCatalog after SparkSession has been created using spark.sql.catalogImplementation configuration property as it is a static configuration.

import org.apache.spark.sql.internal.StaticSQLConf
scala> spark.conf.set(StaticSQLConf.CATALOG_IMPLEMENTATION.key, "hive")
org.apache.spark.sql.AnalysisException: Cannot modify the value of a static config: spark.sql.catalogImplementation;
  at org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:144)
  at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:41)
  ... 49 elided

ExternalCatalog is a ListenerBus of ExternalCatalogEventListener listeners that handle ExternalCatalogEvent events.

Tip

Use addListener and removeListener to register and de-register ExternalCatalogEventListener listeners, accordingly.

Read ListenerBus Event Bus Contract in Mastering Apache Spark 2 gitbook to learn more about Spark Core’s ListenerBus interface.

Altering Table Statistics — alterTableStats Method

alterTableStats(db: String, table: String, stats: Option[CatalogStatistics]): Unit

alterTableStats…​FIXME

Note
alterTableStats is used exclusively when SessionCatalog is requested for altering the statistics of a table in a metastore (that can happen when any logical command is executed that could change the table statistics).

Altering Table — alterTable Method

alterTable(tableDefinition: CatalogTable): Unit

alterTable…​FIXME

Note
alterTable is used exclusively when SessionCatalog is requested for altering the statistics of a table in a metastore.

createTable Method

createTable(tableDefinition: CatalogTable, ignoreIfExists: Boolean): Unit

createTable…​FIXME

Note
createTable is used when…​FIXME

alterTableDataSchema Method

alterTableDataSchema(db: String, table: String, newDataSchema: StructType): Unit

alterTableDataSchema…​FIXME

Note
alterTableDataSchema is used exclusively when SessionCatalog is requested to alterTableDataSchema.

results matching ""

    No results matching ""