ReceivedBlockHandlers
ReceivedBlockHandler
represents how to handle the storage of blocks received by receivers.
Note
|
It is used by ReceiverSupervisorImpl (as the internal receivedBlockHandler). |
ReceivedBlockHandler Contract
ReceivedBlockHandler
is a private[streaming] trait
. It comes with two methods:
-
storeBlock(blockId: StreamBlockId, receivedBlock: ReceivedBlock): ReceivedBlockStoreResult
to store a received block asblockId
. -
cleanupOldBlocks(threshTime: Long)
to clean up blocks older thanthreshTime
.
Note
|
cleanupOldBlocks implies that there is a relation between blocks and the time they arrived.
|
Implementations of ReceivedBlockHandler Contract
There are two implementations of ReceivedBlockHandler
contract:
-
BlockManagerBasedBlockHandler
that stores received blocks in Spark’s BlockManager with the specified StorageLevel.Read BlockManagerBasedBlockHandler in this document.
-
WriteAheadLogBasedBlockHandler
that stores received blocks in a write ahead log and Spark’s BlockManager. It is a more advanced option comparing to a simpler BlockManagerBasedBlockHandler.Read WriteAheadLogBasedBlockHandler in this document.
BlockManagerBasedBlockHandler
BlockManagerBasedBlockHandler
is the default ReceivedBlockHandler
in Spark Streaming.
It uses BlockManager and a receiver’s StorageLevel.
cleanupOldBlocks
is not used as blocks are cleared by some other means (FIXME)
putResult
returns BlockManagerBasedStoreResult
. It uses BlockManager.putIterator
to store ReceivedBlock
.
WriteAheadLogBasedBlockHandler
WriteAheadLogBasedBlockHandler
is used when spark.streaming.receiver.writeAheadLog.enable is true
.
It uses BlockManager, a receiver’s streamId
and StorageLevel, SparkConf for additional configuration settings, Hadoop Configuration, the checkpoint directory.