log4j.logger.org.apache.spark.sql.execution.streaming.MemoryStream=ALL
MemoryStream — Streaming Reader for Micro-Batch Stream Processing
MemoryStream is a concrete streaming source of memory data source that supports reading in Micro-Batch Stream Processing.
|
Tip
|
Enable Add the following line to Refer to Logging. |
Creating MemoryStream Instance
MemoryStream takes the following to be created:
MemoryStream initializes the internal properties.
Creating MemoryStream Instance — apply Object Factory
apply[A : Encoder](
implicit sqlContext: SQLContext): MemoryStream[A]
apply uses an memoryStreamId internal counter to create a new MemoryStream with a unique ID and the implicit SQLContext.
Adding Data to Source — addData Method
addData(
data: TraversableOnce[A]): Offset
addData adds the given data to the batches internal registry.
Internally, addData prints out the following DEBUG message to the logs:
Adding: [data]
In the end, addData increments the current offset and adds the data to the batches internal registry.
Generating Next Streaming Batch — getBatch Method
|
Note
|
getBatch is a part of Streaming Source contract.
|
When executed, getBatch uses the internal batches collection to return requested offsets.
You should see the following DEBUG message in the logs:
DEBUG MemoryStream: MemoryBatch [[startOrdinal], [endOrdinal]]: [newBlocks]
Logical Plan — logicalPlan Internal Property
logicalPlan: LogicalPlan
|
Note
|
logicalPlan is part of the MemoryStreamBase Contract for the logical query plan of the memory stream.
|
logicalPlan is simply a StreamingExecutionRelation (for this memory source and the attributes).
MemoryStream uses StreamingExecutionRelation logical plan to build Datasets or DataFrames when requested.
scala> val ints = MemoryStream[Int]
ints: org.apache.spark.sql.execution.streaming.MemoryStream[Int] = MemoryStream[value#13]
scala> ints.toDS.queryExecution.logical.isStreaming
res14: Boolean = true
scala> ints.toDS.queryExecution.logical
res15: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan = MemoryStream[value#13]
Textual Representation — toString Method
toString: String
|
Note
|
toString is part of the java.lang.Object contract for the string representation of the object.
|
toString uses the output schema to return the following textual representation:
MemoryStream[[output]]
Plan Input Partitions — planInputPartitions Method
planInputPartitions(): java.util.List[InputPartition[InternalRow]]
|
Note
|
planInputPartitions is part of the DataSourceReader contract in Spark SQL for the number of InputPartitions to use as RDD partitions (when DataSourceV2ScanExec physical operator is requested for the partitions of the input RDD).
|
planInputPartitions…FIXME
planInputPartitions prints out a DEBUG message to the logs with the generateDebugString (with the batches after the last committed offset).
planInputPartitions…FIXME
generateDebugString Internal Method
generateDebugString(
rows: Seq[UnsafeRow],
startOrdinal: Int,
endOrdinal: Int): String
generateDebugString resolves and binds the encoder for the data.
In the end, generateDebugString returns the following string:
MemoryBatch [[startOrdinal], [endOrdinal]]: [rows]
|
Note
|
generateDebugString is used exclusively when MemoryStream is requested to planInputPartitions.
|
Internal Properties
| Name | Description |
|---|---|
|
|
|
Current offset (as LongOffset) |
|
Last committed offset (as LongOffset) |
|
Output schema ( Used exclusively for toString |