Catalog — Metastore Management Interface

Catalog is the interface for managing a metastore (aka metadata catalog) of relational entities (e.g. database(s), tables, functions, table columns and temporary views).

Catalog is available using SparkSession.catalog property.

scala> :type spark
org.apache.spark.sql.SparkSession

scala> :type spark.catalog
org.apache.spark.sql.catalog.Catalog
Table 1. Catalog Contract
Method Description

cacheTable

cacheTable(tableName: String): Unit
cacheTable(tableName: String, storageLevel: StorageLevel): Unit

Caches the specified table in memory

Used for SQL’s CACHE TABLE and AlterTableRenameCommand command.

clearCache

clearCache(): Unit

createTable

createTable(tableName: String, path: String): DataFrame
createTable(
  tableName: String,
  source: String,
  options: java.util.Map[String, String]): DataFrame
createTable(
  tableName: String,
  source: String,
  options: Map[String, String]): DataFrame
createTable(tableName: String, path: String, source: String): DataFrame
createTable(
  tableName: String,
  source: String,
  schema: StructType,
  options: java.util.Map[String, String]): DataFrame
createTable(
  tableName: String,
  source: String,
  schema: StructType,
  options: Map[String, String]): DataFrame

currentDatabase

currentDatabase: String

databaseExists

databaseExists(dbName: String): Boolean

dropGlobalTempView

dropGlobalTempView(viewName: String): Boolean

dropTempView

dropTempView(viewName: String): Boolean

functionExists

functionExists(functionName: String): Boolean
functionExists(dbName: String, functionName: String): Boolean

getDatabase

getDatabase(dbName: String): Database

getFunction

getFunction(functionName: String): Function
getFunction(dbName: String, functionName: String): Function

getTable

getTable(tableName: String): Table
getTable(dbName: String, tableName: String): Table

isCached

isCached(tableName: String): Boolean

listColumns

listColumns(tableName: String): Dataset[Column]
listColumns(dbName: String, tableName: String): Dataset[Column]

listDatabases

listDatabases(): Dataset[Database]

listFunctions

listFunctions(): Dataset[Function]
listFunctions(dbName: String): Dataset[Function]

listTables

listTables(): Dataset[Table]
listTables(dbName: String): Dataset[Table]

recoverPartitions

recoverPartitions(tableName: String): Unit

refreshByPath

refreshByPath(path: String): Unit

refreshTable

refreshTable(tableName: String): Unit

setCurrentDatabase

setCurrentDatabase(dbName: String): Unit

tableExists

tableExists(tableName: String): Boolean
tableExists(dbName: String, tableName: String): Boolean

uncacheTable

uncacheTable(
  tableName: String): Unit
Note
CatalogImpl is the one and only known implementation of the Catalog Contract in Apache Spark.

results matching ""

    No results matching ""