scala> df.rdd.getNumPartitions res6: Int = 8 scala> df.coalesce(1).rdd.getNumPartitions res7: Int = 1 scala> df.coalesce(1).explain(extended = true) == Parsed Logical Plan == Repartition 1, false +- LocalRelation [value#1] == Analyzed Logical Plan == value: int Repartition 1, false +- LocalRelation [value#1] == Optimized Logical Plan == Repartition 1, false +- LocalRelation [value#1] == Physical Plan == Coalesce 1 +- LocalTableScan [value#1]
CoalesceExec Unary Physical Operator
CoalesceExec is a unary physical operator (i.e. with one child physical operator) to…FIXME…with
numPartitions number of partitions and a
child spark plan.
CoalesceExec represents Repartition logical operator at execution (when shuffle was disabled — see BasicOperators execution planning strategy). When executed, it executes the input
child and calls coalesce on the result RDD (with
Please note that since physical operators present themselves without the suffix Exec,
CoalesceExec is the
Coalesce in the Physical Plan section in the following example:
output collection of Attribute matches the
CoalesceExec is about changing the number of partitions not the internal representation).
outputPartitioning returns a SinglePartition when the input
1 while a UnknownPartitioning partitioning scheme for the other cases.