ShuffleManager Contract — Pluggable Shuffle Systems

ShuffleManager is the contract of shuffle systems that manage shuffle metadata on the driver (using shuffle IDs or ShuffleHandle) and allow running tasks on executors to access the metadata.


SortShuffleManager (short name: sort or tungsten-sort) is the default implementation of the ShuffleManager Contract in Apache Spark.

Use spark.shuffle.manager configuration property to set up a custom ShuffleManager.

Table 1. ShuffleManager Contract
Method Description


getReader[K, C](
  handle: ShuffleHandle,
  startPartition: Int,
  endPartition: Int,
  context: TaskContext): ShuffleReader[K, C]

Gives ShuffleReader to read shuffle data in the ShuffleHandle

Used when the following RDDs are requested to compute a partition:


getWriter[K, V](
  handle: ShuffleHandle,
  mapId: Int,
  context: TaskContext): ShuffleWriter[K, V]

Gives ShuffleWriter to write shuffle data in the ShuffleHandle

Used exclusively when ShuffleMapTask is requested to run (and requests the ShuffleWriter to write records for a partition)


registerShuffle[K, V, C](
  shuffleId: Int,
  numMaps: Int,
  dependency: ShuffleDependency[K, V, C]): ShuffleHandle

Registers a shuffle (by the given shuffleId and ShuffleDependency) and returns a ShuffleHandle

Used exclusively when ShuffleDependency is created (and registers itself with the shuffle system)


shuffleBlockResolver: ShuffleBlockResolver

Gives ShuffleBlockResolver of the shuffle system

Used when:


stop(): Unit

Stops the shuffle system

Used exclusively when SparkEnv is requested to stop


unregisterShuffle(shuffleId: Int): Boolean

Unregisters a shuffle (given shuffleId)

Used exclusively when BlockManagerSlaveEndpoint is requested to receives a RemoveShuffle message

Accessing ShuffleManager using SparkEnv

The driver and executor access the ShuffleManager instance using SparkEnv.shuffleManager.

val shuffleManager = SparkEnv.get.shuffleManager

Further Reading or Watching

  1. (slides) Spark shuffle introduction by Raymond Liu (aka colorant).

results matching ""

    No results matching ""