DescribeColumnCommand Logical Command for DESCRIBE TABLE SQL Command with Column

DescribeColumnCommand is a logical command for DESCRIBE TABLE SQL command with a single column only (i.e. no PARTITION specification).

[DESC|DESCRIBE] TABLE? [EXTENDED|FORMATTED] table_name column_name
// Make the example reproducible
val tableName = "t1"
import org.apache.spark.sql.catalyst.TableIdentifier
val tableId = TableIdentifier(tableName)

val sessionCatalog = spark.sessionState.catalog
sessionCatalog.dropTable(tableId, ignoreIfNotExists = true, purge = true)

val df = Seq((0, 0.0, "zero"), (1, 1.4, "one")).toDF("id", "p1", "p2")
df.write.saveAsTable("t1")

// DescribeColumnCommand represents DESC EXTENDED tableName colName SQL command
val descExtSQL = "DESC EXTENDED t1 p1"
val plan = spark.sql(descExtSQL).queryExecution.logical
import org.apache.spark.sql.execution.command.DescribeColumnCommand
val cmd = plan.asInstanceOf[DescribeColumnCommand]
scala> println(cmd)
DescribeColumnCommand `t1`, [p1], true

scala> spark.sql(descExtSQL).show
+--------------+----------+
|     info_name|info_value|
+--------------+----------+
|      col_name|        p1|
|     data_type|    double|
|       comment|      NULL|
|           min|      NULL|
|           max|      NULL|
|     num_nulls|      NULL|
|distinct_count|      NULL|
|   avg_col_len|      NULL|
|   max_col_len|      NULL|
|     histogram|      NULL|
+--------------+----------+

// Run ANALYZE TABLE...FOR COLUMNS SQL command to compute the column statistics
val allCols = df.columns.mkString(",")
val analyzeTableSQL = s"ANALYZE TABLE $tableName COMPUTE STATISTICS FOR COLUMNS $allCols"
spark.sql(analyzeTableSQL)

scala> spark.sql(descExtSQL).show
+--------------+----------+
|     info_name|info_value|
+--------------+----------+
|      col_name|        p1|
|     data_type|    double|
|       comment|      NULL|
|           min|       0.0|
|           max|       1.4|
|     num_nulls|         0|
|distinct_count|         2|
|   avg_col_len|         8|
|   max_col_len|         8|
|     histogram|      NULL|
+--------------+----------+

DescribeColumnCommand defines the output schema with the following columns:

  • info_name with "name of the column info" comment

  • info_value with "value of the column info" comment

Note
DescribeColumnCommand is described by describeTable labeled alternative in statement expression in SqlBase.g4 and parsed using SparkSqlParser.

Executing Logical Command (Describing Column with Optional Statistics) — run Method

run(session: SparkSession): Seq[Row]
Note
run is part of RunnableCommand Contract to execute (run) a logical command.

run resolves the column name in table and makes sure that it is a "flat" field (i.e. not of a nested data type).

run requests the SessionCatalog for the table metadata.

Note
run uses the input SparkSession to access SessionState that in turn is used to access the SessionCatalog.

run takes the column statistics from the table statistics if available.

Note
Column statistics are available (in the table statistics) only after ANALYZE TABLE FOR COLUMNS SQL command was run.

run adds comment metadata if available for the column.

run gives the following rows (in that order):

  1. col_name

  2. data_type

  3. comment

If DescribeColumnCommand command was executed with EXTENDED or FORMATTED option, run gives the following additional rows (in that order):

  1. min

  2. max

  3. num_nulls

  4. distinct_count

  5. avg_col_len

  6. max_col_len

  7. histogram

run gives NULL for the value of the comment and statistics if not available.

histogramDescription Internal Method

histogramDescription(histogram: Histogram): Seq[Row]

histogramDescription…​FIXME

Note
histogramDescription is used exclusively when DescribeColumnCommand is executed with EXTENDED or FORMATTED option turned on.

Creating DescribeColumnCommand Instance

DescribeColumnCommand takes the following when created:

results matching ""

    No results matching ""