ExternalSorter

ExternalSorter is a Spillable of WritablePartitionedPairCollection of K-key / C-value pairs.

When created ExternalSorter expects three different types of data defined, i.e. K, V, C, for keys, values, and combiner (partial) values, respectively.

Tip

Enable INFO or WARN logging levels for org.apache.spark.util.collection.ExternalSorter logger to see what happens in ExternalSorter.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.util.collection.ExternalSorter=INFO

Refer to Logging.

stop Method

Caution
FIXME

writePartitionedFile Method

Caution
FIXME

Creating ExternalSorter Instance

ExternalSorter takes the following:

  1. TaskContext

  2. Optional Aggregator

  3. Optional Partitioner

  4. Optional Scala’s Ordering

  5. Optional Serializer

Note
ExternalSorter uses SparkEnv to access the default Serializer.

spillMemoryIteratorToDisk Internal Method

spillMemoryIteratorToDisk(inMemoryIterator: WritablePartitionedIterator): SpilledFile
Caution
FIXME

spill Method

spill(collection: WritablePartitionedPairCollection[K, C]): Unit
Note
spill is part of Spillable contract.
Caution
FIXME

maybeSpillCollection Internal Method

maybeSpillCollection(usingMap: Boolean): Unit
Caution
FIXME

insertAll Method

insertAll(records: Iterator[Product2[K, V]]): Unit
Caution
FIXME

Settings

Table 1. Spark Properties
Spark Property Default Value Description

spark.shuffle.file.buffer

32k

Size of the in-memory buffer for each shuffle file output stream. In bytes unless the unit is specified.

These buffers reduce the number of disk seeks and system calls made in creating intermediate shuffle files.

Used in ExternalSorter, BypassMergeSortShuffleWriter and ExternalAppendOnlyMap (for fileBufferSize) and in ShuffleExternalSorter (for fileBufferSizeBytes).

NOTE: spark.shuffle.file.buffer was previously known as spark.shuffle.file.buffer.kb.

spark.shuffle.spill.batchSize

10000

Size of object batches when reading/writing from serializers.

results matching ""

    No results matching ""