// Seq((0, 0, "zero"), (1, 1, "one")).toDF("id", "p1", "p2").write.partitionBy("p1", "p2").saveAsTable("t1")
val analyzeTable = "ANALYZE TABLE t1 PARTITION (p1, p2) COMPUTE STATISTICS"
val plan = spark.sql(analyzeTable).queryExecution.logical
import org.apache.spark.sql.execution.command.AnalyzePartitionCommand
val cmd = plan.asInstanceOf[AnalyzePartitionCommand]
scala> println(cmd)
AnalyzePartitionCommand `t1`, Map(p1 -> None, p2 -> None), false
AnalyzePartitionCommand Logical Command — Computing Partition-Level Statistics (Total Size and Row Count)
AnalyzePartitionCommand
is a logical command that computes statistics (i.e. total size and row count) for table partitions and stores the stats in a metastore.
AnalyzePartitionCommand
is created exclusively for ANALYZE TABLE with PARTITION
specification only (i.e. no FOR COLUMNS
clause).
Executing Logical Command (Computing Partition-Level Statistics and Altering Metastore) — run
Method
run(sparkSession: SparkSession): Seq[Row]
Note
|
run is part of RunnableCommand Contract to execute (run) a logical command.
|
run
requests the session-specific SessionCatalog
for the metadata of the table and makes sure that it is not a view.
Note
|
run uses the input SparkSession to access the session-specific SessionState that in turn is used to access the current SessionCatalog.
|
run
getPartitionSpec.
run
requests the session-specific SessionCatalog
for the partitions per the partition specification.
run
finishes when the table has no partitions defined in a metastore.
run
computes row count statistics per partition unless noscan flag was enabled.
run
calculates total size (in bytes) (aka partition location size) for every table partition and creates a CatalogStatistics with the current statistics if different from the statistics recorded in the metastore (with a new row count statistic computed earlier).
In the end, run
alters table partition metadata for partitions with the statistics changed.
run
reports a NoSuchPartitionException
when partitions do not match the metastore.
run
reports an AnalysisException
when executed on a view.
ANALYZE TABLE is not supported on views.
Computing Row Count Statistics Per Partition — calculateRowCountsPerPartition
Internal Method
calculateRowCountsPerPartition(
sparkSession: SparkSession,
tableMeta: CatalogTable,
partitionValueSpec: Option[TablePartitionSpec]): Map[TablePartitionSpec, BigInt]
calculateRowCountsPerPartition
…FIXME
Note
|
calculateRowCountsPerPartition is used exclusively when AnalyzePartitionCommand is executed.
|
getPartitionSpec
Internal Method
getPartitionSpec(table: CatalogTable): Option[TablePartitionSpec]
getPartitionSpec
…FIXME
Note
|
getPartitionSpec is used exclusively when AnalyzePartitionCommand is executed.
|
Creating AnalyzePartitionCommand Instance
AnalyzePartitionCommand
takes the following when created:
-
noscan
flag (enabled by default) that indicates whether NOSCAN option was used or not