ReceiverInputDStreams - Input Streams with Receivers

Receiver Input Streams (ReceiverInputDStreams) are specialized input streams that use receivers to receive data (and hence the name which stands for an InputDStream with a receiver).

Note	Receiver input streams run receivers as long-running tasks that occupy a core per stream.

ReceiverInputDStream abstract class defines the following abstract method that custom implementations use to create receivers:

def getReceiver(): Receiver[T]

The receiver is then sent to and run on workers (when ReceiverTracker is started).

Note

A fine example of a very minimalistic yet still useful implementation of ReceiverInputDStream class is the pluggable input stream org.apache.spark.streaming.dstream.PluggableInputDStream (the sources on GitHub). It requires a Receiver to be given (by a developer) and simply returns it in getReceiver.

PluggableInputDStream is used by StreamingContext.receiverStream() method.

ReceiverInputDStream uses ReceiverRateController when spark.streaming.backpressure.enabled is enabled.

Note	Both, `start()` and `stop` methods are implemented in `ReceiverInputDStream`, but do nothing. `ReceiverInputDStream` management is left to ReceiverTracker. Read ReceiverTrackerEndpoint.startReceiver for more details.

The source code of ReceiverInputDStream is here at GitHub.

Generate RDDs for Batch Interval — `compute` Method

The abstract compute(validTime: Time): Option[RDD[T]] method (from DStream) uses start time of DStreamGraph, i.e. the start time of StreamingContext, to check whether validTime input parameter is really valid.

If the time to generate RDDs (validTime) is earlier than the start time of StreamingContext, an empty BlockRDD is generated.

Otherwise, ReceiverTracker is requested for all the blocks that have been allocated to this stream for this batch (using ReceiverTracker.getBlocksOfBatch).

The number of records received for the batch for the input stream (as StreamInputInfo) is registered with InputInfoTracker.

If all BlockIds have WriteAheadLogRecordHandle, a WriteAheadLogBackedBlockRDD is generated. Otherwise, a BlockRDD is.

Back Pressure

Caution

FIXME

Back pressure for input dstreams with receivers can be configured using spark.streaming.backpressure.enabled setting.

Note	Back pressure is disabled by default.

ReceiverInputDStreams