package org.apache.spark.sql.execution.joins
trait HashJoin {
// only required methods that have no implementation
// the others follow
val leftKeys: Seq[Expression]
val rightKeys: Seq[Expression]
val joinType: JoinType
val buildSide: BuildSide
val condition: Option[Expression]
val left: SparkPlan
val right: SparkPlan
}
HashJoin — Contract for Hash-based Join Physical Operators
HashJoin is the contract for hash-based join physical operators (e.g. BroadcastHashJoinExec and ShuffledHashJoinExec).
| Method | Description |
|---|---|
Left or right build side Used when:
|
|
| Name | Description |
|---|---|
Build join keys (as Catalyst expressions) |
|
Streamed join keys (as Catalyst expressions) |
|
join Method
join(
streamedIter: Iterator[InternalRow],
hashed: HashedRelation,
numOutputRows: SQLMetric,
avgHashProbe: SQLMetric): Iterator[InternalRow]
join branches off per joinType to create a join iterator of internal rows (i.e. Iterator[InternalRow]) for the input streamedIter and hashed:
-
outerJoin for a LeftOuter or a RightOuter join
-
existenceJoin for a ExistenceJoin join
join requests TaskContext to add a TaskCompletionListener to update the input avg hash probe SQL metric. The TaskCompletionListener is executed on a task completion (regardless of the task status: success, failure, or cancellation) and uses getAverageProbesPerLookup from the input hashed to set the input avg hash probe.
join createResultProjection.
In the end, for every row in the join iterator of internal rows join increments the input numOutputRows SQL metric and applies the result projection.
join reports a IllegalArgumentException when the joinType is incorrect.
[x] JoinType is not supported
|
Note
|
join is used when BroadcastHashJoinExec and ShuffledHashJoinExec are executed.
|
innerJoin Internal Method
innerJoin(
streamIter: Iterator[InternalRow],
hashedRelation: HashedRelation): Iterator[InternalRow]
innerJoin…FIXME
|
Note
|
innerJoin is used when…FIXME
|
outerJoin Internal Method
outerJoin(
streamedIter: Iterator[InternalRow],
hashedRelation: HashedRelation): Iterator[InternalRow]
outerJoin…FIXME
|
Note
|
outerJoin is used when…FIXME
|
semiJoin Internal Method
semiJoin(
streamIter: Iterator[InternalRow],
hashedRelation: HashedRelation): Iterator[InternalRow]
semiJoin…FIXME
|
Note
|
semiJoin is used when…FIXME
|
antiJoin Internal Method
antiJoin(
streamIter: Iterator[InternalRow],
hashedRelation: HashedRelation): Iterator[InternalRow]
antiJoin…FIXME
|
Note
|
antiJoin is used when…FIXME
|