createPartitioning(numPartitions: Int): Partitioning
ClusteredDistribution
ClusteredDistribution is a Distribution that creates a HashPartitioning for the clustering expressions and a requested number of partitions.
ClusteredDistribution requires that the clustering expressions should not be empty (i.e. Nil).
ClusteredDistribution is created when the following physical operators are requested for a required child distribution:
-
MapGroupsExec, HashAggregateExec, ObjectHashAggregateExec, SortAggregateExec, WindowExec -
Spark Structured Streaming’s
FlatMapGroupsWithStateExec,StateStoreRestoreExec,StateStoreSaveExec,StreamingDeduplicateExec,StreamingSymmetricHashJoinExec,StreamingSymmetricHashJoinExec -
SparkR’s
FlatMapGroupsInRExec -
PySpark’s
FlatMapGroupsInPandasExec
ClusteredDistribution is used when:
-
DataSourcePartitioning,SinglePartition,HashPartitioning, andRangePartitioningare requested tosatisfies -
EnsureRequirementsis requested to add an ExchangeCoordinator for Adaptive Query Execution
createPartitioning Method
|
Note
|
createPartitioning is part of Distribution Contract to create a Partitioning for a given number of partitions.
|
createPartitioning creates a HashPartitioning for the clustering expressions and the input numPartitions.
createPartitioning reports an AssertionError when the number of partitions is not the input numPartitions.
This ClusteredDistribution requires [requiredNumPartitions] partitions, but the actual number of partitions is [numPartitions].
Creating ClusteredDistribution Instance
ClusteredDistribution takes the following when created:
-
Clustering expressions
|
Note
|
None for the required number of partitions indicates to use any number of partitions (possibly spark.sql.shuffle.partitions configuration property with the default of 200 partitions).
|