[numBuckets] buckets, bucket columns: [[bucketColumnNames]], sort columns: [[sortColumnNames]]
BucketSpec — Bucketing Specification of Table
BucketSpec is the bucketing specification of a table, i.e. the metadata of the bucketing of a table.
BucketSpec includes the following:
The number of buckets has to be between 0 and 100000 exclusive (or an AnalysisException is thrown).
BucketSpec is created when:
-
DataFrameWriteris requested to saveAsTable (and does getBucketSpec) -
HiveExternalCatalogis requested to getBucketSpecFromTableProperties and tableMetaToTableProps -
HiveClientImplis requested to retrieve a table metadata -
SparkSqlAstBuilderis requested to visitBucketSpec (forCREATE TABLESQL statement withCLUSTERED BYandINTO n BUCKETSwith optionalSORTED BYclauses)
BucketSpec uses the following text representation (i.e. toString):
import org.apache.spark.sql.catalyst.catalog.BucketSpec
val bucketSpec = BucketSpec(
numBuckets = 8,
bucketColumnNames = Seq("col1"),
sortColumnNames = Seq("col2"))
scala> println(bucketSpec)
8 buckets, bucket columns: [col1], sort columns: [col2]
Converting Bucketing Specification to LinkedHashMap — toLinkedHashMap Method
toLinkedHashMap: mutable.LinkedHashMap[String, String]
toLinkedHashMap converts the bucketing specification to a collection of pairs (LinkedHashMap[String, String]) with the following fields and their values:
-
Num Buckets with the numBuckets
-
Bucket Columns with the bucketColumnNames
-
Sort Columns with the sortColumnNames
toLinkedHashMap quotes the column names.
scala> println(bucketSpec.toLinkedHashMap)
Map(Num Buckets -> 8, Bucket Columns -> [`col1`], Sort Columns -> [`col2`])
|
Note
|
|