[DESC|DESCRIBE] TABLE? [EXTENDED|FORMATTED] table_name column_name
DescribeColumnCommand Logical Command for DESCRIBE TABLE SQL Command with Column
DescribeColumnCommand
is a logical command for DESCRIBE TABLE SQL command with a single column only (i.e. no PARTITION
specification).
// Make the example reproducible
val tableName = "t1"
import org.apache.spark.sql.catalyst.TableIdentifier
val tableId = TableIdentifier(tableName)
val sessionCatalog = spark.sessionState.catalog
sessionCatalog.dropTable(tableId, ignoreIfNotExists = true, purge = true)
val df = Seq((0, 0.0, "zero"), (1, 1.4, "one")).toDF("id", "p1", "p2")
df.write.saveAsTable("t1")
// DescribeColumnCommand represents DESC EXTENDED tableName colName SQL command
val descExtSQL = "DESC EXTENDED t1 p1"
val plan = spark.sql(descExtSQL).queryExecution.logical
import org.apache.spark.sql.execution.command.DescribeColumnCommand
val cmd = plan.asInstanceOf[DescribeColumnCommand]
scala> println(cmd)
DescribeColumnCommand `t1`, [p1], true
scala> spark.sql(descExtSQL).show
+--------------+----------+
| info_name|info_value|
+--------------+----------+
| col_name| p1|
| data_type| double|
| comment| NULL|
| min| NULL|
| max| NULL|
| num_nulls| NULL|
|distinct_count| NULL|
| avg_col_len| NULL|
| max_col_len| NULL|
| histogram| NULL|
+--------------+----------+
// Run ANALYZE TABLE...FOR COLUMNS SQL command to compute the column statistics
val allCols = df.columns.mkString(",")
val analyzeTableSQL = s"ANALYZE TABLE $tableName COMPUTE STATISTICS FOR COLUMNS $allCols"
spark.sql(analyzeTableSQL)
scala> spark.sql(descExtSQL).show
+--------------+----------+
| info_name|info_value|
+--------------+----------+
| col_name| p1|
| data_type| double|
| comment| NULL|
| min| 0.0|
| max| 1.4|
| num_nulls| 0|
|distinct_count| 2|
| avg_col_len| 8|
| max_col_len| 8|
| histogram| NULL|
+--------------+----------+
DescribeColumnCommand
defines the output schema with the following columns:
-
info_name
with "name of the column info" comment -
info_value
with "value of the column info" comment
Note
|
DescribeColumnCommand is described by describeTable labeled alternative in statement expression in SqlBase.g4 and parsed using SparkSqlParser.
|
Executing Logical Command (Describing Column with Optional Statistics) — run
Method
run(session: SparkSession): Seq[Row]
Note
|
run is part of RunnableCommand Contract to execute (run) a logical command.
|
run
resolves the column name in table and makes sure that it is a "flat" field (i.e. not of a nested data type).
run
requests the SessionCatalog
for the table metadata.
Note
|
run uses the input SparkSession to access SessionState that in turn is used to access the SessionCatalog.
|
run
takes the column statistics from the table statistics if available.
Note
|
Column statistics are available (in the table statistics) only after ANALYZE TABLE FOR COLUMNS SQL command was run. |
run
adds comment
metadata if available for the column.
run
gives the following rows (in that order):
-
col_name
-
data_type
-
comment
If DescribeColumnCommand
command was executed with EXTENDED or FORMATTED option, run
gives the following additional rows (in that order):
-
min
-
max
-
num_nulls
-
distinct_count
-
avg_col_len
-
max_col_len
run
gives NULL
for the value of the comment and statistics if not available.
histogramDescription
Internal Method
histogramDescription(histogram: Histogram): Seq[Row]
histogramDescription
…FIXME
Note
|
histogramDescription is used exclusively when DescribeColumnCommand is executed with EXTENDED or FORMATTED option turned on.
|
Creating DescribeColumnCommand Instance
DescribeColumnCommand
takes the following when created:
-
isExtended
flag that indicates whether EXTENDED or FORMATTED option was used or not