MapStatus — Shuffle Map Output Status

There are two types of MapStatus:

  • CompressedMapStatus that compresses the estimated map output size to 8 bits (Byte) for efficient reporting.

  • HighlyCompressedMapStatus that stores the average size of non-empty blocks, and a compressed bitmap for tracking which blocks are empty.

When the number of blocks (the size of uncompressedSizes) is greater than 2000, HighlyCompressedMapStatus is chosen.

FIXME What exactly is 2000? Is this the number of tasks in a job?

MapStatus Contract

trait MapStatus {
  def location: BlockManagerId
  def getSizeForBlock(reduceId: Int): Long
MapStatus is a private[spark] contract.
Table 1. MapStatus Contract
Method Description


The BlockManager where a ShuffleMapTask ran and the result is stored.


The estimated size for the reduce block (in bytes).

results matching ""

    No results matching ""