ShuffledHashJoinExec Binary Physical Operator

ShuffledHashJoinExec is a binary physical operator for hash-based joins.

ShuffledHashJoinExec is created for joins with joining keys and one of the following holds:

Start spark-shell with ShuffledHashJoinExec's selection requirements

./bin/spark-shell \
    -c spark.sql.join.preferSortMergeJoin=false \
    -c spark.sql.autoBroadcastJoinThreshold=1

scala> spark.conf.get("spark.sql.join.preferSortMergeJoin")
res0: String = false

scala> spark.conf.get("spark.sql.autoBroadcastJoinThreshold")
res1: String = 1

scala> spark.conf.get("spark.sql.shuffle.partitions")
res2: String = 200

val dataset = Seq(
  (0, "playing"),
  (1, "with"),
  (2, "ShuffledHashJoinExec")
).toDF("id", "token")
val query = dataset.join(dataset, Seq("id"), "leftsemi")

scala> query.queryExecution.optimizedPlan.stats(spark.sessionState.conf).sizeInBytes
res3: BigInt = 72

scala> query.explain
== Physical Plan ==
ShuffledHashJoin [id#15], [id#20], LeftSemi, BuildRight
:- Exchange hashpartitioning(id#15, 200)
:  +- LocalTableScan [id#15, token#16]
+- Exchange hashpartitioning(id#20, 200)
   +- LocalTableScan [id#20]
ShuffledHashJoinExec operator is chosen in JoinSelection execution planning strategy.
Table 1. ShuffledHashJoinExec’s SQLMetrics
Name Description


data size of build side


time to build hash map


number of output rows

spark sql ShuffledHashJoinExec webui query details.png
Figure 1. ShuffledHashJoinExec in web UI (Details for Query)
Table 2. ShuffledHashJoinExec’s Required Child Output Distributions
Left Child Right Child

ClusteredDistribution (per left join key expressions)

ClusteredDistribution (per right join key expressions)

Executing ShuffledHashJoinExec — doExecute Method

doExecute(): RDD[InternalRow]
doExecute is a part of SparkPlan Contract to produce the result of a structured query as an RDD of internal binary rows.

buildHashedRelation Internal Method


Creating ShuffledHashJoinExec Instance

ShuffledHashJoinExec takes the following when created:

results matching ""

    No results matching ""