HashClusteredDistribution

HashClusteredDistribution is a Distribution that creates a HashPartitioning for the hash expressions and a requested number of partitions.

HashClusteredDistribution specifies None for the required number of partitions.

Note
None for the required number of partitions indicates to use any number of partitions (possibly spark.sql.shuffle.partitions configuration property with the default of 200 partitions).

HashClusteredDistribution is created when the following physical operators are requested for the required partition requirements of the child operator(s) (e.g. CoGroupExec, ShuffledHashJoinExec, SortMergeJoinExec and Spark Structured Streaming’s StreamingSymmetricHashJoinExec).

HashClusteredDistribution takes hash expressions when created.

HashClusteredDistribution requires that the hash expressions should not be empty (i.e. Nil).

HashClusteredDistribution is used when:

  • EnsureRequirements is requested to add an ExchangeCoordinator for Adaptive Query Execution

  • HashPartitioning is requested to satisfies

createPartitioning Method

createPartitioning(
  numPartitions: Int): Partitioning
Note
createPartitioning is part of Distribution Contract to create a Partitioning for a given number of partitions.

createPartitioning creates a HashPartitioning for the hash expressions and the input numPartitions.

results matching ""

    No results matching ""