spark.range(1).createOrReplaceTempView("demo")
// DESC view
scala> sql("DESC EXTENDED demo").show
+--------+---------+-------+
|col_name|data_type|comment|
+--------+---------+-------+
| id| bigint| null|
+--------+---------+-------+
// DESC table
// Make the demo reproducible
spark.sharedState.externalCatalog.dropTable(
db = "default",
table = "bucketed",
ignoreIfNotExists = true,
purge = true)
spark.range(10).write.bucketBy(5, "id").saveAsTable("bucketed")
assert(spark.catalog.tableExists("bucketed"))
// EXTENDED to include Detailed Table Information
// Note no partitions used
// Could also be FORMATTED
scala> sql("DESC EXTENDED bucketed").show(numRows = 50, truncate = false)
+----------------------------+-----------------------------------------------------------------------------+-------+
|col_name |data_type |comment|
+----------------------------+-----------------------------------------------------------------------------+-------+
|id |bigint |null |
| | | |
|# Detailed Table Information| | |
|Database |default | |
|Table |bucketed | |
|Owner |jacek | |
|Created Time |Sun Sep 30 20:57:22 CEST 2018 | |
|Last Access |Thu Jan 01 01:00:00 CET 1970 | |
|Created By |Spark 2.3.1 | |
|Type |MANAGED | |
|Provider |parquet | |
|Num Buckets |5 | |
|Bucket Columns |[`id`] | |
|Sort Columns |[] | |
|Table Properties |[transient_lastDdlTime=1538333842] | |
|Statistics |3740 bytes | |
|Location |file:/Users/jacek/dev/apps/spark-2.3.1-bin-hadoop2.7/spark-warehouse/bucketed| |
|Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | |
|InputFormat |org.apache.hadoop.mapred.SequenceFileInputFormat | |
|OutputFormat |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | |
|Storage Properties |[serialization.format=1] | |
+----------------------------+-----------------------------------------------------------------------------+-------+
// Make the demo reproducible
val tableName = "partitioned_bucketed_sorted"
val partCol = "part"
spark.sharedState.externalCatalog.dropTable(
db = "default",
table = tableName,
ignoreIfNotExists = true,
purge = true)
spark.range(10)
.withColumn("part", $"id" % 2) // extra column for partitions
.write
.partitionBy(partCol)
.bucketBy(5, "id")
.sortBy("id")
.saveAsTable(tableName)
assert(spark.catalog.tableExists(tableName))
scala> sql(s"DESC EXTENDED $tableName").show(numRows = 50, truncate = false)
+----------------------------+------------------------------------------------------------------------------------------------+-------+
|col_name |data_type |comment|
+----------------------------+------------------------------------------------------------------------------------------------+-------+
|id |bigint |null |
|part |bigint |null |
|# Partition Information | | |
|# col_name |data_type |comment|
|part |bigint |null |
| | | |
|# Detailed Table Information| | |
|Database |default | |
|Table |partitioned_bucketed_sorted | |
|Owner |jacek | |
|Created Time |Mon Oct 01 10:05:32 CEST 2018 | |
|Last Access |Thu Jan 01 01:00:00 CET 1970 | |
|Created By |Spark 2.3.1 | |
|Type |MANAGED | |
|Provider |parquet | |
|Num Buckets |5 | |
|Bucket Columns |[`id`] | |
|Sort Columns |[`id`] | |
|Table Properties |[transient_lastDdlTime=1538381132] | |
|Location |file:/Users/jacek/dev/apps/spark-2.3.1-bin-hadoop2.7/spark-warehouse/partitioned_bucketed_sorted| |
|Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | |
|InputFormat |org.apache.hadoop.mapred.SequenceFileInputFormat | |
|OutputFormat |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | |
|Storage Properties |[serialization.format=1] | |
|Partition Provider |Catalog | |
+----------------------------+------------------------------------------------------------------------------------------------+-------+
scala> sql(s"DESCRIBE EXTENDED $tableName PARTITION ($partCol=1)").show(numRows = 50, truncate = false)
+--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+-------+
|col_name |data_type |comment|
+--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+-------+
|id |bigint |null |
|part |bigint |null |
|# Partition Information | | |
|# col_name |data_type |comment|
|part |bigint |null |
| | | |
|# Detailed Partition Information| | |
|Database |default | |
|Table |partitioned_bucketed_sorted | |
|Partition Values |[part=1] | |
|Location |file:/Users/jacek/dev/apps/spark-2.3.1-bin-hadoop2.7/spark-warehouse/partitioned_bucketed_sorted/part=1 | |
|Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | |
|InputFormat |org.apache.hadoop.mapred.SequenceFileInputFormat | |
|OutputFormat |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | |
|Storage Properties |[path=file:/Users/jacek/dev/apps/spark-2.3.1-bin-hadoop2.7/spark-warehouse/partitioned_bucketed_sorted, serialization.format=1]| |
|Partition Parameters |{totalSize=1870, numFiles=5, transient_lastDdlTime=1538381329} | |
|Partition Statistics |1870 bytes | |
| | | |
|# Storage Information | | |
|Num Buckets |5 | |
|Bucket Columns |[`id`] | |
|Sort Columns |[`id`] | |
|Location |file:/Users/jacek/dev/apps/spark-2.3.1-bin-hadoop2.7/spark-warehouse/partitioned_bucketed_sorted | |
|Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | |
|InputFormat |org.apache.hadoop.mapred.SequenceFileInputFormat | |
|OutputFormat |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | |
|Storage Properties |[serialization.format=1] | |
+--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+-------+
DescribeTableCommand Logical Command
DescribeTableCommand
is a logical command that executes a DESCRIBE TABLE
SQL statement.
DescribeTableCommand
is created exclusively when SparkSqlAstBuilder
is requested to parse DESCRIBE TABLE SQL statement (with no column specified).
DescribeTableCommand
uses the following output schema:
-
col_name
as the name of the column -
data_type
as the data type of the column -
comment
as the comment of the column
Executing Logical Command — run
Method
run(sparkSession: SparkSession): Seq[Row]
Note
|
run is part of the RunnableCommand Contract to execute (run) a logical command.
|
run
uses the SessionCatalog (of the SessionState of the input SparkSession) and branches off per the type of the table to display.
For a temporary view, run
requests the SessionCatalog
to lookupRelation to access the schema and describeSchema.
For all other table types, run
does the following:
-
Requests the
SessionCatalog
to retrieve the table metadata from the external catalog (metastore) (as a CatalogTable) and describeSchema (with the schema) -
describeDetailedPartitionInfo if the TablePartitionSpec is available or describeFormattedTableInfo when isExtended flag is on
Describing Detailed Partition and Storage Information — describeFormattedDetailedPartitionInfo
Internal Method
describeFormattedDetailedPartitionInfo(
tableIdentifier: TableIdentifier,
table: CatalogTable,
partition: CatalogTablePartition,
buffer: ArrayBuffer[Row]): Unit
describeFormattedDetailedPartitionInfo
simply adds the following entries (rows) to the input mutable buffer:
-
A new line
-
# Detailed Partition Information
-
Database with the database of the given
table
-
Table with the table of the given
tableIdentifier
-
A new line
-
# Storage Information
-
Bucketing specification of the table (if defined)
-
Storage specification of the table
Note
|
describeFormattedDetailedPartitionInfo is used exclusively when DescribeTableCommand is requested to describeDetailedPartitionInfo with a non-empty partitionSpec and the isExtended flag on.
|
Describing Detailed Table Information — describeFormattedTableInfo
Internal Method
describeFormattedTableInfo(table: CatalogTable, buffer: ArrayBuffer[Row]): Unit
describeFormattedTableInfo
…FIXME
Note
|
describeFormattedTableInfo is used exclusively when DescribeTableCommand is requested to run for a non-temporary table and the isExtended flag on.
|
describeDetailedPartitionInfo
Internal Method
describeDetailedPartitionInfo(
tableIdentifier: TableIdentifier,
table: CatalogTable,
partition: CatalogTablePartition,
buffer: ArrayBuffer[Row]): Unit
describeDetailedPartitionInfo
…FIXME
Note
|
describeDetailedPartitionInfo is used exclusively when DescribeTableCommand is requested to run with a non-empty partitionSpec.
|
Creating DescribeTableCommand Instance
DescribeTableCommand
takes the following when created:
DescribeTableCommand
initializes the internal registries and counters.