StreamingRelation Leaf Logical Operator for Streaming Source

StreamingRelation is a leaf logical operator (i.e. LogicalPlan) that represents a streaming source in a logical plan.

StreamingRelation is created when DataStreamReader is requested to load data from a streaming source and creates a streaming Dataset.

Figure 1. StreamingRelation Represents Streaming Source
val rate = spark.
  readStream.     // <-- creates a DataStreamReader
  load("hello")   // <-- creates a StreamingRelation
scala> println(rate.queryExecution.logical.numberedTreeString)
00 StreamingRelation DataSource(org.apache.spark.sql.SparkSession@4e5dcc50,rate,List(),None,List(),None,Map(path -> hello),None), rate, [timestamp#0, value#1L]

isStreaming flag is always enabled (i.e. true).

import org.apache.spark.sql.execution.streaming.StreamingRelation
val relation = rate.queryExecution.logical.asInstanceOf[StreamingRelation]
scala> relation.isStreaming
res1: Boolean = true

toString gives the source name.

scala> println(relation)
StreamingRelation is resolved (aka planned) to StreamingExecutionRelation (right after StreamExecution starts running batches).

Creating StreamingRelation for DataSource — apply Object Method

apply(dataSource: DataSource): StreamingRelation

apply creates a StreamingRelation for the given DataSource (that represents a streaming source).

apply is used exclusively when DataStreamReader is requested for a streaming DataFrame.

Creating StreamingRelation Instance

StreamingRelation takes the following when created:

  • DataSource

  • Short name of the streaming source

  • Output attributes of the schema of the streaming source

results matching ""

    No results matching ""