HashJoin — Contract for Hash-based Join Physical Operators

HashJoin is the contract for hash-based join physical operators (e.g. BroadcastHashJoinExec and ShuffledHashJoinExec).

package org.apache.spark.sql.execution.joins

trait HashJoin {
  // only required methods that have no implementation
  // the others follow
  val leftKeys: Seq[Expression]
  val rightKeys: Seq[Expression]
  val joinType: JoinType
  val buildSide: BuildSide
  val condition: Option[Expression]
  val left: SparkPlan
  val right: SparkPlan
}
Table 1. HashJoin Contract
Method Description

buildSide

Left or right build side

Used when:

joinType

JoinType

Table 2. HashJoin’s Internal Properties (e.g. Registries, Counters and Flags)
Name Description

boundCondition

buildKeys

Build join keys (as Catalyst expressions)

buildPlan

streamedKeys

Streamed join keys (as Catalyst expressions)

streamedPlan

join Method

join(
  streamedIter: Iterator[InternalRow],
  hashed: HashedRelation,
  numOutputRows: SQLMetric,
  avgHashProbe: SQLMetric): Iterator[InternalRow]

join branches off per joinType to create a join iterator of internal rows (i.e. Iterator[InternalRow]) for the input streamedIter and hashed:

join requests TaskContext to add a TaskCompletionListener to update the input avg hash probe SQL metric. The TaskCompletionListener is executed on a task completion (regardless of the task status: success, failure, or cancellation) and uses getAverageProbesPerLookup from the input hashed to set the input avg hash probe.

In the end, for every row in the join iterator of internal rows join increments the input numOutputRows SQL metric and applies the result projection.

join reports a IllegalArgumentException when the joinType is incorrect.

[x] JoinType is not supported
Note
join is used when BroadcastHashJoinExec and ShuffledHashJoinExec are executed.

innerJoin Internal Method

innerJoin(
  streamIter: Iterator[InternalRow],
  hashedRelation: HashedRelation): Iterator[InternalRow]

innerJoin…​FIXME

Note
innerJoin is used when…​FIXME

outerJoin Internal Method

outerJoin(
  streamedIter: Iterator[InternalRow],
  hashedRelation: HashedRelation): Iterator[InternalRow]

outerJoin…​FIXME

Note
outerJoin is used when…​FIXME

semiJoin Internal Method

semiJoin(
  streamIter: Iterator[InternalRow],
  hashedRelation: HashedRelation): Iterator[InternalRow]

semiJoin…​FIXME

Note
semiJoin is used when…​FIXME

antiJoin Internal Method

antiJoin(
  streamIter: Iterator[InternalRow],
  hashedRelation: HashedRelation): Iterator[InternalRow]

antiJoin…​FIXME

Note
antiJoin is used when…​FIXME

existenceJoin Internal Method

existenceJoin(
  streamIter: Iterator[InternalRow],
  hashedRelation: HashedRelation): Iterator[InternalRow]

existenceJoin…​FIXME

Note
existenceJoin is used when…​FIXME

createResultProjection Method

createResultProjection(): (InternalRow) => InternalRow

createResultProjection…​FIXME

Note
createResultProjection is used when…​FIXME

results matching ""

    No results matching ""