ShuffleManager Contract — Pluggable Shuffle Systems

ShuffleManager is the contract of shuffle systems that manage shuffle metadata on the driver (using shuffle IDs or ShuffleHandle) and allow running tasks on executors to access the metadata.

Note

SortShuffleManager (short name: sort or tungsten-sort) is the default implementation of the ShuffleManager Contract in Apache Spark.

Use spark.shuffle.manager configuration property to set up a custom ShuffleManager.

Table 1. ShuffleManager Contract
Method Description

getReader

getReader[K, C](
  handle: ShuffleHandle,
  startPartition: Int,
  endPartition: Int,
  context: TaskContext): ShuffleReader[K, C]

Gives ShuffleReader to read shuffle data in the ShuffleHandle

Used when the following RDDs are requested to compute a partition:

getWriter

getWriter[K, V](
  handle: ShuffleHandle,
  mapId: Int,
  context: TaskContext): ShuffleWriter[K, V]

Gives ShuffleWriter to write shuffle data in the ShuffleHandle

Used exclusively when ShuffleMapTask is requested to run (and requests the ShuffleWriter to write records for a partition)

registerShuffle

registerShuffle[K, V, C](
  shuffleId: Int,
  numMaps: Int,
  dependency: ShuffleDependency[K, V, C]): ShuffleHandle

Registers a shuffle (by the given shuffleId and ShuffleDependency) and returns a ShuffleHandle

Used exclusively when ShuffleDependency is created (and registers itself with the shuffle system)

shuffleBlockResolver

shuffleBlockResolver: ShuffleBlockResolver

Gives ShuffleBlockResolver of the shuffle system

Used when:

stop

stop(): Unit

Stops the shuffle system

Used exclusively when SparkEnv is requested to stop

unregisterShuffle

unregisterShuffle(shuffleId: Int): Boolean

Unregisters a shuffle (given shuffleId)

Used exclusively when BlockManagerSlaveEndpoint is requested to receives a RemoveShuffle message

Accessing ShuffleManager using SparkEnv

The driver and executor access the ShuffleManager instance using SparkEnv.shuffleManager.

val shuffleManager = SparkEnv.get.shuffleManager

Further Reading or Watching

  1. (slides) Spark shuffle introduction by Raymond Liu (aka colorant).

results matching ""

    No results matching ""