HashedRelation

HashedRelation is the contract for "relations" with values hashed by some key.

HashedRelation is a KnownSizeEstimation.

package org.apache.spark.sql.execution.joins

trait HashedRelation extends KnownSizeEstimation {
  // only required methods that have no implementation
  // the others follow
  def asReadOnlyCopy(): HashedRelation
  def close(): Unit
  def get(key: InternalRow): Iterator[InternalRow]
  def getAverageProbesPerLookup: Double
  def getValue(key: InternalRow): InternalRow
  def keyIsUnique: Boolean
}
Note
HashedRelation is a private[execution] contract.
Table 1. HashedRelation Contract
Method Description

asReadOnlyCopy

Gives a read-only copy of this HashedRelation to be safely used in a separate thread.

Used exclusively when BroadcastHashJoinExec is requested to execute (and transform every partitions of streamedPlan physical operator using the broadcast variable of buildPlan physical operator).

get

Gives internal rows for the given key or null

Used when HashJoin is requested to innerJoin, outerJoin, semiJoin, existenceJoin and antiJoin.

getValue

Gives the value internal row for a given key

Note
HashedRelation has two variants of getValue, i.e. one that accepts an InternalRow and another a Long. getValue with an InternalRow does not seem to be used at all.

getAverageProbesPerLookup

Used when…​FIXME

getValue Method

getValue(key: Long): InternalRow
Note
This is getValue that takes a long key. There is the more generic getValue that takes an internal row instead.

getValue simply reports an UnsupportedOperationException (and expects concrete HashedRelations to provide a more meaningful implementation).

Note
getValue is used exclusively when LongHashedRelation is requested to get the value for a given key.

Creating Concrete HashedRelation Instance (for Build Side of Hash-based Join) — apply Factory Method

apply(
  input: Iterator[InternalRow],
  key: Seq[Expression],
  sizeEstimate: Int = 64,
  taskMemoryManager: TaskMemoryManager = null): HashedRelation

apply creates a LongHashedRelation when the input key collection has a single expression of type long or UnsafeHashedRelation otherwise.

Note

The input key expressions are:

Note

apply is used when:

results matching ""

    No results matching ""