CatalogTable — Table Specification (Native Table Metadata)

CatalogTable is the table specification, i.e. the metadata of a table that is stored in a session-scoped catalog of relational entities (i.e. SessionCatalog).

scala> :type spark.sessionState.catalog
org.apache.spark.sql.catalyst.catalog.SessionCatalog

// Using high-level user-friendly catalog interface
scala> spark.catalog.listTables.filter($"name" === "t1").show
+----+--------+-----------+---------+-----------+
|name|database|description|tableType|isTemporary|
+----+--------+-----------+---------+-----------+
|  t1| default|       null|  MANAGED|      false|
+----+--------+-----------+---------+-----------+

// Using low-level internal SessionCatalog interface to access CatalogTables
val t1Tid = spark.sessionState.sqlParser.parseTableIdentifier("t1")
val t1Metadata = spark.sessionState.catalog.getTempViewOrPermanentTableMetadata(t1Tid)
scala> :type t1Metadata
org.apache.spark.sql.catalyst.catalog.CatalogTable

CatalogTable is created when:

The readable text representation of a CatalogTable (aka simpleString) is…​FIXME

Note
simpleString is used exclusively when ShowTablesCommand logical command is executed (with a partition specification).

CatalogTable uses the following text representation (i.e. toString)…​FIXME

CatalogTable is created with the optional bucketing specification that is used for the following:

Table Statistics for Query Planning (Auto Broadcast Joins and Cost-Based Optimization)

You manage a table metadata using the catalog interface (aka metastore). Among the management tasks is to get the statistics of a table (that are used for cost-based query optimization).

scala> t1Metadata.stats.foreach(println)
CatalogStatistics(714,Some(2),Map(p1 -> ColumnStat(2,Some(0),Some(1),0,4,4,None), id -> ColumnStat(2,Some(0),Some(1),0,4,4,None)))

scala> t1Metadata.stats.map(_.simpleString).foreach(println)
714 bytes, 2 rows
Note
The CatalogStatistics are optional when CatalogTable is created.
Caution
FIXME When are stats specified? What if there are not?

Unless CatalogStatistics are available in a table metadata (in a catalog) for a non-streaming file data source table, DataSource creates a HadoopFsRelation with the table size specified by spark.sql.defaultSizeInBytes internal property (default: Long.MaxValue) for query planning of joins (and possibly to auto broadcast the table).

Internally, Spark alters table statistics using ExternalCatalog.doAlterTableStats.

Unless CatalogStatistics are available in a table metadata (in a catalog) for HiveTableRelation (and hive provider) DetermineTableStats logical resolution rule can compute the table size using HDFS (if spark.sql.statistics.fallBackToHdfs property is turned on) or assume spark.sql.defaultSizeInBytes (that effectively disables table broadcasting).

You can use AnalyzeColumnCommand, AnalyzePartitionCommand, AnalyzeTableCommand commands to record statistics in a catalog.

The table statistics can be automatically updated (after executing commands like AlterTableAddPartitionCommand) when spark.sql.statistics.size.autoUpdate.enabled property is turned on.

You can use DESCRIBE SQL command to show the histogram of a column if stored in a catalog.

dataSchema Method

dataSchema: StructType

dataSchema…​FIXME

Note
dataSchema is used when…​FIXME

partitionSchema Method

partitionSchema: StructType

partitionSchema…​FIXME

Note
partitionSchema is used when…​FIXME

Converting Table Specification to LinkedHashMap — toLinkedHashMap Method

toLinkedHashMap: mutable.LinkedHashMap[String, String]

toLinkedHashMap converts the table specification to a collection of pairs (LinkedHashMap[String, String]) with the following fields and their values:

Note

toLinkedHashMap is used when:

Creating CatalogTable Instance

CatalogTable takes the following when created:

  • TableIdentifier

  • CatalogTableType (i.e. EXTERNAL, MANAGED or VIEW)

  • CatalogStorageFormat

  • Schema

  • Name of the table provider (optional)

  • Partition column names

  • Optional Bucketing specification (default: None)

  • Owner

  • Create time

  • Last access time

  • Create version

  • Properties

  • Optional table statistics

  • Optional view text

  • Optional comment

  • Unsupported features

  • tracksPartitionsInCatalog flag

  • schemaPreservesCase flag

  • Ignored properties

database Method

database: String

database simply returns the database (of the TableIdentifier) or throws an AnalysisException:

table [identifier] did not specify database
Note
database is used when…​FIXME

results matching ""

    No results matching ""