scala> df.rdd.getNumPartitions
res6: Int = 8
scala> df.coalesce(1).rdd.getNumPartitions
res7: Int = 1
scala> df.coalesce(1).explain(extended = true)
== Parsed Logical Plan ==
Repartition 1, false
+- LocalRelation [value#1]
== Analyzed Logical Plan ==
value: int
Repartition 1, false
+- LocalRelation [value#1]
== Optimized Logical Plan ==
Repartition 1, false
+- LocalRelation [value#1]
== Physical Plan ==
Coalesce 1
+- LocalTableScan [value#1]
CoalesceExec Unary Physical Operator
CoalesceExec
is a unary physical operator (i.e. with one child physical operator) to…FIXME…with numPartitions
number of partitions and a child
spark plan.
CoalesceExec
represents Repartition logical operator at execution (when shuffle was disabled — see BasicOperators execution planning strategy). When executed, it executes the input child
and calls coalesce on the result RDD (with shuffle
disabled).
Please note that since physical operators present themselves without the suffix Exec, CoalesceExec
is the Coalesce
in the Physical Plan section in the following example:
output
collection of Attribute matches the child
's (since CoalesceExec
is about changing the number of partitions not the internal representation).
outputPartitioning
returns a SinglePartition when the input numPartitions
is 1
while a UnknownPartitioning partitioning scheme for the other cases.