AnalyzePartitionCommand Logical Command — Computing Partition-Level Statistics (Total Size and Row Count)

AnalyzePartitionCommand is a logical command that computes statistics (i.e. total size and row count) for table partitions and stores the stats in a metastore.

AnalyzePartitionCommand is created exclusively for ANALYZE TABLE with PARTITION specification only (i.e. no FOR COLUMNS clause).

// Seq((0, 0, "zero"), (1, 1, "one")).toDF("id", "p1", "p2").write.partitionBy("p1", "p2").saveAsTable("t1")
val analyzeTable = "ANALYZE TABLE t1 PARTITION (p1, p2) COMPUTE STATISTICS"
val plan = spark.sql(analyzeTable).queryExecution.logical
import org.apache.spark.sql.execution.command.AnalyzePartitionCommand
val cmd = plan.asInstanceOf[AnalyzePartitionCommand]
scala> println(cmd)
AnalyzePartitionCommand `t1`, Map(p1 -> None, p2 -> None), false

Executing Logical Command (Computing Partition-Level Statistics and Altering Metastore) — run Method

run(sparkSession: SparkSession): Seq[Row]
Note
run is part of RunnableCommand Contract to execute (run) a logical command.

run requests the session-specific SessionCatalog for the metadata of the table and makes sure that it is not a view.

Note
run uses the input SparkSession to access the session-specific SessionState that in turn is used to access the current SessionCatalog.

run requests the session-specific SessionCatalog for the partitions per the partition specification.

run finishes when the table has no partitions defined in a metastore.

run calculates total size (in bytes) (aka partition location size) for every table partition and creates a CatalogStatistics with the current statistics if different from the statistics recorded in the metastore (with a new row count statistic computed earlier).

In the end, run alters table partition metadata for partitions with the statistics changed.

run reports a NoSuchPartitionException when partitions do not match the metastore.

run reports an AnalysisException when executed on a view.

ANALYZE TABLE is not supported on views.

Computing Row Count Statistics Per Partition — calculateRowCountsPerPartition Internal Method

calculateRowCountsPerPartition(
  sparkSession: SparkSession,
  tableMeta: CatalogTable,
  partitionValueSpec: Option[TablePartitionSpec]): Map[TablePartitionSpec, BigInt]

calculateRowCountsPerPartition…​FIXME

Note
calculateRowCountsPerPartition is used exclusively when AnalyzePartitionCommand is executed.

getPartitionSpec Internal Method

getPartitionSpec(table: CatalogTable): Option[TablePartitionSpec]

getPartitionSpec…​FIXME

Note
getPartitionSpec is used exclusively when AnalyzePartitionCommand is executed.

Creating AnalyzePartitionCommand Instance

AnalyzePartitionCommand takes the following when created:

  • TableIdentifier

  • Partition specification

  • noscan flag (enabled by default) that indicates whether NOSCAN option was used or not

results matching ""

    No results matching ""