package org.apache.spark.sql.execution.joins
trait HashJoin {
// only required methods that have no implementation
// the others follow
val leftKeys: Seq[Expression]
val rightKeys: Seq[Expression]
val joinType: JoinType
val buildSide: BuildSide
val condition: Option[Expression]
val left: SparkPlan
val right: SparkPlan
}
HashJoin — Contract for Hash-based Join Physical Operators
HashJoin
is the contract for hash-based join physical operators (e.g. BroadcastHashJoinExec and ShuffledHashJoinExec).
Method | Description |
---|---|
Left or right build side Used when:
|
|
Name | Description |
---|---|
Build join keys (as Catalyst expressions) |
|
Streamed join keys (as Catalyst expressions) |
|
join
Method
join(
streamedIter: Iterator[InternalRow],
hashed: HashedRelation,
numOutputRows: SQLMetric,
avgHashProbe: SQLMetric): Iterator[InternalRow]
join
branches off per joinType to create a join iterator of internal rows (i.e. Iterator[InternalRow]
) for the input streamedIter
and hashed
:
-
outerJoin for a LeftOuter or a RightOuter join
-
existenceJoin for a ExistenceJoin join
join
requests TaskContext
to add a TaskCompletionListener
to update the input avg hash probe SQL metric. The TaskCompletionListener
is executed on a task completion (regardless of the task status: success, failure, or cancellation) and uses getAverageProbesPerLookup from the input hashed
to set the input avg hash probe.
join
createResultProjection.
In the end, for every row in the join iterator of internal rows join
increments the input numOutputRows
SQL metric and applies the result projection.
join
reports a IllegalArgumentException
when the joinType is incorrect.
[x] JoinType is not supported
Note
|
join is used when BroadcastHashJoinExec and ShuffledHashJoinExec are executed.
|
innerJoin
Internal Method
innerJoin(
streamIter: Iterator[InternalRow],
hashedRelation: HashedRelation): Iterator[InternalRow]
innerJoin
…FIXME
Note
|
innerJoin is used when…FIXME
|
outerJoin
Internal Method
outerJoin(
streamedIter: Iterator[InternalRow],
hashedRelation: HashedRelation): Iterator[InternalRow]
outerJoin
…FIXME
Note
|
outerJoin is used when…FIXME
|
semiJoin
Internal Method
semiJoin(
streamIter: Iterator[InternalRow],
hashedRelation: HashedRelation): Iterator[InternalRow]
semiJoin
…FIXME
Note
|
semiJoin is used when…FIXME
|
antiJoin
Internal Method
antiJoin(
streamIter: Iterator[InternalRow],
hashedRelation: HashedRelation): Iterator[InternalRow]
antiJoin
…FIXME
Note
|
antiJoin is used when…FIXME
|