[DESC|DESCRIBE] TABLE? [EXTENDED|FORMATTED] table_name column_name
DescribeColumnCommand Logical Command for DESCRIBE TABLE SQL Command with Column
DescribeColumnCommand is a logical command for DESCRIBE TABLE SQL command with a single column only (i.e. no PARTITION specification).
// Make the example reproducible
val tableName = "t1"
import org.apache.spark.sql.catalyst.TableIdentifier
val tableId = TableIdentifier(tableName)
val sessionCatalog = spark.sessionState.catalog
sessionCatalog.dropTable(tableId, ignoreIfNotExists = true, purge = true)
val df = Seq((0, 0.0, "zero"), (1, 1.4, "one")).toDF("id", "p1", "p2")
df.write.saveAsTable("t1")
// DescribeColumnCommand represents DESC EXTENDED tableName colName SQL command
val descExtSQL = "DESC EXTENDED t1 p1"
val plan = spark.sql(descExtSQL).queryExecution.logical
import org.apache.spark.sql.execution.command.DescribeColumnCommand
val cmd = plan.asInstanceOf[DescribeColumnCommand]
scala> println(cmd)
DescribeColumnCommand `t1`, [p1], true
scala> spark.sql(descExtSQL).show
+--------------+----------+
| info_name|info_value|
+--------------+----------+
| col_name| p1|
| data_type| double|
| comment| NULL|
| min| NULL|
| max| NULL|
| num_nulls| NULL|
|distinct_count| NULL|
| avg_col_len| NULL|
| max_col_len| NULL|
| histogram| NULL|
+--------------+----------+
// Run ANALYZE TABLE...FOR COLUMNS SQL command to compute the column statistics
val allCols = df.columns.mkString(",")
val analyzeTableSQL = s"ANALYZE TABLE $tableName COMPUTE STATISTICS FOR COLUMNS $allCols"
spark.sql(analyzeTableSQL)
scala> spark.sql(descExtSQL).show
+--------------+----------+
| info_name|info_value|
+--------------+----------+
| col_name| p1|
| data_type| double|
| comment| NULL|
| min| 0.0|
| max| 1.4|
| num_nulls| 0|
|distinct_count| 2|
| avg_col_len| 8|
| max_col_len| 8|
| histogram| NULL|
+--------------+----------+
DescribeColumnCommand defines the output schema with the following columns:
-
info_namewith "name of the column info" comment -
info_valuewith "value of the column info" comment
|
Note
|
DescribeColumnCommand is described by describeTable labeled alternative in statement expression in SqlBase.g4 and parsed using SparkSqlParser.
|
Executing Logical Command (Describing Column with Optional Statistics) — run Method
run(session: SparkSession): Seq[Row]
|
Note
|
run is part of RunnableCommand Contract to execute (run) a logical command.
|
run resolves the column name in table and makes sure that it is a "flat" field (i.e. not of a nested data type).
run requests the SessionCatalog for the table metadata.
|
Note
|
run uses the input SparkSession to access SessionState that in turn is used to access the SessionCatalog.
|
run takes the column statistics from the table statistics if available.
|
Note
|
Column statistics are available (in the table statistics) only after ANALYZE TABLE FOR COLUMNS SQL command was run. |
run adds comment metadata if available for the column.
run gives the following rows (in that order):
-
col_name -
data_type -
comment
If DescribeColumnCommand command was executed with EXTENDED or FORMATTED option, run gives the following additional rows (in that order):
-
min -
max -
num_nulls -
distinct_count -
avg_col_len -
max_col_len
run gives NULL for the value of the comment and statistics if not available.
histogramDescription Internal Method
histogramDescription(histogram: Histogram): Seq[Row]
histogramDescription…FIXME
|
Note
|
histogramDescription is used exclusively when DescribeColumnCommand is executed with EXTENDED or FORMATTED option turned on.
|
Creating DescribeColumnCommand Instance
DescribeColumnCommand takes the following when created:
-
isExtendedflag that indicates whether EXTENDED or FORMATTED option was used or not