Distribution Contract — Data Distribution Across Partitions

Distribution is the contract of…​FIXME

package org.apache.spark.sql.catalyst.plans.physical

sealed trait Distribution {
  def requiredNumPartitions: Option[Int]
  def createPartitioning(numPartitions: Int): Partitioning
}
Note
Distribution is a Scala sealed contract which means that all possible distributions are all in the same compilation unit (file).
Table 1. Distribution Contract
Method Description

requiredNumPartitions

Gives the required number of partitions for a distribution.

Used exclusively when EnsureRequirements physical optimization is requested to enforce partition requirements of a physical operator (and a child operator’s output partitioning does not satisfy a required child distribution that leads to inserting a ShuffleExchangeExec operator to a physical plan).

Note
None for the required number of partitions indicates to use any number of partitions (possibly spark.sql.shuffle.partitions configuration property with the default of 200 partitions).

createPartitioning

Creates a Partitioning for a given number of partitions.

Used exclusively when EnsureRequirements physical optimization is requested to enforce partition requirements of a physical operator (and creates a ShuffleExchangeExec physical operator with a required Partitioning).

Table 2. Distributions
Distribution Description

AllTuples

BroadcastDistribution

ClusteredDistribution

HashClusteredDistribution

OrderedDistribution

UnspecifiedDistribution

results matching ""

    No results matching ""