val plan = dataset.queryExecution.logical
LogicalPlan is a QueryPlan and corresponds to a logical operator being the entire structured query to execute in Spark SQL.
A logical plan can be analyzed which is to say that the plan (including children) has gone through analysis and verification.
scala> plan.analyzed res1: Boolean = true
A logical plan can also be resolved to a specific schema.
scala> plan.resolved res2: Boolean = true
A logical plan knows the size of objects that are results of query operators, like
scala> val stats = plan.statistics stats: org.apache.spark.sql.catalyst.plans.logical.Statistics = Statistics(8,false)
A logical plan knows the maximum number of records it can compute.
scala> val maxRows = plan.maxRows maxRows: Option[Long] = None
||Logical operator with no child operators|
Logical operator with a single child operator
Logical operator with two child operators
Known commands are:
RunnableCommand is the base
trait for side-effecting logical commands that are executed for their side-effects.
RunnableCommand defines one abstract method
run that computes a collection of Rows.
run(sparkSession: SparkSession): Seq[Row]
isStreaming is a part of the public API of
LogicalPlan and is enabled (i.e.
true) when a logical plan is a streaming source.
By default, it walks over subtrees and calls itself, i.e.
isStreaming, on every child node to find a streaming source.
val spark: SparkSession = ... // Regular dataset scala> val ints = spark.createDataset(0 to 9) ints: org.apache.spark.sql.Dataset[Int] = [value: int] scala> ints.queryExecution.logical.isStreaming res1: Boolean = false // Streaming dataset scala> val logs = spark.readStream.format("text").load("logs/*.out") logs: org.apache.spark.sql.DataFrame = [value: string] scala> logs.queryExecution.logical.isStreaming res2: Boolean = true