scala> df.rdd.getNumPartitions res6: Int = 8 scala> df.coalesce(1).rdd.getNumPartitions res7: Int = 1 scala> df.coalesce(1).explain(extended = true) == Parsed Logical Plan == Repartition 1, false +- LocalRelation [value#1] == Analyzed Logical Plan == value: int Repartition 1, false +- LocalRelation [value#1] == Optimized Logical Plan == Repartition 1, false +- LocalRelation [value#1] == Physical Plan == Coalesce 1 +- LocalTableScan [value#1]
CoalesceExec is a unary physical operator with
numPartitions number of partitions and a
child spark plan.
Repartition logical operator at execution (when shuffle was disabled — see
BasicOperators strategy). When executed, it executes the input
child and calls coalesce on the result RDD (with
Please note that since physical operators present themselves without the suffix Exec,
CoalesceExec is the
Coalesce in the Physical Plan section in the following example:
output collection of Attribute matches the
CoalesceExec is about changing the number of partitions not the internal representation).
outputPartitioning returns a
SinglePartition when the input
1 while a
UnknownPartitioning for the other cases.