BucketSpec — Bucketing Specification of Table

BucketSpec is the bucketing specification of a table, i.e. the metadata of the bucketing of a table.

BucketSpec includes the following:

  • Number of buckets

  • Bucket column names - the names of the columns used for buckets (at least one)

  • Sort column names - the names of the columns used to sort data in buckets

The number of buckets has to be between 0 and 100000 exclusive (or an AnalysisException is thrown).

BucketSpec is created when:

  1. DataFrameWriter is requested to saveAsTable (and does getBucketSpec)

  2. HiveExternalCatalog is requested to getBucketSpecFromTableProperties and tableMetaToTableProps

  3. HiveClientImpl is requested to retrieve a table metadata

  4. SparkSqlAstBuilder is requested to visitBucketSpec (for CREATE TABLE SQL statement with CLUSTERED BY and INTO n BUCKETS with optional SORTED BY clauses)

BucketSpec uses the following text representation (i.e. toString):

[numBuckets] buckets, bucket columns: [[bucketColumnNames]], sort columns: [[sortColumnNames]]
import org.apache.spark.sql.catalyst.catalog.BucketSpec
val bucketSpec = BucketSpec(
  numBuckets = 8,
  bucketColumnNames = Seq("col1"),
  sortColumnNames = Seq("col2"))
scala> println(bucketSpec)
8 buckets, bucket columns: [col1], sort columns: [col2]

Converting Bucketing Specification to LinkedHashMap — toLinkedHashMap Method

toLinkedHashMap: mutable.LinkedHashMap[String, String]

toLinkedHashMap converts the bucketing specification to a collection of pairs (LinkedHashMap[String, String]) with the following fields and their values:

toLinkedHashMap quotes the column names.

scala> println(bucketSpec.toLinkedHashMap)
Map(Num Buckets -> 8, Bucket Columns -> [`col1`], Sort Columns -> [`col2`])

toLinkedHashMap is used when:

results matching ""

    No results matching ""