[numBuckets] buckets, bucket columns: [[bucketColumnNames]], sort columns: [[sortColumnNames]]
BucketSpec — Bucketing Specification of Table
BucketSpec
is the bucketing specification of a table, i.e. the metadata of the bucketing of a table.
BucketSpec
includes the following:
The number of buckets has to be between 0
and 100000
exclusive (or an AnalysisException
is thrown).
BucketSpec
is created when:
-
DataFrameWriter
is requested to saveAsTable (and does getBucketSpec) -
HiveExternalCatalog
is requested to getBucketSpecFromTableProperties and tableMetaToTableProps -
HiveClientImpl
is requested to retrieve a table metadata -
SparkSqlAstBuilder
is requested to visitBucketSpec (forCREATE TABLE
SQL statement withCLUSTERED BY
andINTO n BUCKETS
with optionalSORTED BY
clauses)
BucketSpec
uses the following text representation (i.e. toString
):
import org.apache.spark.sql.catalyst.catalog.BucketSpec
val bucketSpec = BucketSpec(
numBuckets = 8,
bucketColumnNames = Seq("col1"),
sortColumnNames = Seq("col2"))
scala> println(bucketSpec)
8 buckets, bucket columns: [col1], sort columns: [col2]
Converting Bucketing Specification to LinkedHashMap — toLinkedHashMap
Method
toLinkedHashMap: mutable.LinkedHashMap[String, String]
toLinkedHashMap
converts the bucketing specification to a collection of pairs (LinkedHashMap[String, String]
) with the following fields and their values:
-
Num Buckets with the numBuckets
-
Bucket Columns with the bucketColumnNames
-
Sort Columns with the sortColumnNames
toLinkedHashMap
quotes the column names.
scala> println(bucketSpec.toLinkedHashMap)
Map(Num Buckets -> 8, Bucket Columns -> [`col1`], Sort Columns -> [`col2`])
Note
|
|