ShuffleExternalSorter — Cache-Efficient Sorter

ShuffleExternalSorter is a specialized cache-efficient sorter that sorts arrays of compressed record pointers and partition ids. By using only 8 bytes of space per record in the sorting array, ShuffleExternalSorter can fit more of the array into cache.

ShuffleExternalSorter is created exclusively when UnsafeShuffleWriter is created (and requested to open).

ShuffleExternalSorter is a concrete MemoryConsumer that can spill to disk to free up execution memory.

ShuffleExternalSorter uses the page size to be the minimum of PackedRecordPointer.MAXIMUM_PAGE_SIZE_BYTES and pageSizeBytes, and Tungsten memory mode).

Tip

Enable ALL logging levels for org.apache.spark.shuffle.sort.ShuffleExternalSorter logger to see what happens in ShuffleExternalSorter.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.shuffle.sort.ShuffleExternalSorter=ALL

Refer to Logging.

getMemoryUsage Internal Method

long getMemoryUsage()

getMemoryUsage…​FIXME

Note
getMemoryUsage is used when…​FIXME

closeAndGetSpills Method

Caution
FIXME

insertRecord Method

Caution
FIXME

freeMemory Method

Caution
FIXME

getPeakMemoryUsedBytes Method

Caution
FIXME

writeSortedFile Internal Method

void writeSortedFile(boolean isLastFile)

writeSortedFile…​FIXME

Note
writeSortedFile is used when…​FIXME

cleanupResources Method

Caution
FIXME

Creating ShuffleExternalSorter Instance

ShuffleExternalSorter takes the following to be created:

ShuffleExternalSorter uses spark.shuffle.file.buffer (for fileBufferSizeBytes) and spark.shuffle.spill.numElementsForceSpillThreshold (for numElementsForSpillThreshold) Spark properties.

ShuffleExternalSorter creates a ShuffleInMemorySorter (with spark.shuffle.sort.useRadixSort Spark property enabled by default).

ShuffleExternalSorter initializes the internal registries and counters.

Spilling To Disk (Freeing Up Execution Memory) — spill Method

long spill(
  long size,
  MemoryConsumer trigger)
Note
spill is part of the MemoryConsumer Contract to sort and spill the current records due to memory pressure.

spill prints out the following INFO message to the logs:

Thread [threadId] spilling sort data of [memoryUsage] to disk ([spillsSize] [time|times] so far)

spill writeSortedFile (with the isLastFile flag disabled).

spill frees execution memory (and records the memory bytes spilled as spillSize).

spill then requests the ShuffleInMemorySorter to reset followed by requesting the TaskMetrics (of the TaskContext) to increase the memory bytes spilled.

In the end, spill returns the memory bytes spilled (spill size).

Note

spill returns 0 when one of the following holds:

Internal Properties

Table 1. ShuffleExternalSorter’s Internal Properties (e.g. Registries, Counters and Flags)
Name Description

allocatedPages

(LinkedList<MemoryBlock>)

Used when…​FIXME

results matching ""

    No results matching ""