ReceivedBlockHandlers
ReceivedBlockHandler represents how to handle the storage of blocks received by receivers.
|
Note
|
It is used by ReceiverSupervisorImpl (as the internal receivedBlockHandler). |
ReceivedBlockHandler Contract
ReceivedBlockHandler is a private[streaming] trait. It comes with two methods:
-
storeBlock(blockId: StreamBlockId, receivedBlock: ReceivedBlock): ReceivedBlockStoreResultto store a received block asblockId. -
cleanupOldBlocks(threshTime: Long)to clean up blocks older thanthreshTime.
|
Note
|
cleanupOldBlocks implies that there is a relation between blocks and the time they arrived.
|
Implementations of ReceivedBlockHandler Contract
There are two implementations of ReceivedBlockHandler contract:
-
BlockManagerBasedBlockHandlerthat stores received blocks in Spark’s BlockManager with the specified StorageLevel.Read BlockManagerBasedBlockHandler in this document.
-
WriteAheadLogBasedBlockHandlerthat stores received blocks in a write ahead log and Spark’s BlockManager. It is a more advanced option comparing to a simpler BlockManagerBasedBlockHandler.Read WriteAheadLogBasedBlockHandler in this document.
BlockManagerBasedBlockHandler
BlockManagerBasedBlockHandler is the default ReceivedBlockHandler in Spark Streaming.
It uses BlockManager and a receiver’s StorageLevel.
cleanupOldBlocks is not used as blocks are cleared by some other means (FIXME)
putResult returns BlockManagerBasedStoreResult. It uses BlockManager.putIterator to store ReceivedBlock.
WriteAheadLogBasedBlockHandler
WriteAheadLogBasedBlockHandler is used when spark.streaming.receiver.writeAheadLog.enable is true.
It uses BlockManager, a receiver’s streamId and StorageLevel, SparkConf for additional configuration settings, Hadoop Configuration, the checkpoint directory.