LocalTableScanExec Leaf Physical Operator

LocalTableScanExec is a leaf physical operator (i.e. no children) and producedAttributes being outputSet.

LocalTableScanExec is created when BasicOperators execution planning strategy resolves LocalRelation and Spark Structured Streaming’s MemoryPlan logical operators.

Read on MemoryPlan logical operator in the Spark Structured Streaming gitbook.
val names = Seq("Jacek", "Agata").toDF("name")
val optimizedPlan = names.queryExecution.optimizedPlan

scala> println(optimizedPlan.numberedTreeString)
00 LocalRelation [name#9]

// Physical plan with LocalTableScanExec operator (shown as LocalTableScan)
scala> names.explain
== Physical Plan ==
LocalTableScan [name#9]

// Going fairly low-level...you've been warned

val plan = names.queryExecution.executedPlan
import org.apache.spark.sql.execution.LocalTableScanExec
val ltse = plan.asInstanceOf[LocalTableScanExec]

val ltseRDD = ltse.execute()
scala> :type ltseRDD

scala> println(ltseRDD.toDebugString)
(2) MapPartitionsRDD[1] at execute at <console>:30 []
 |  ParallelCollectionRDD[0] at execute at <console>:30 []

// no computation on the source dataset has really occurred yet
// Let's trigger a RDD action
scala> ltseRDD.first
res6: org.apache.spark.sql.catalyst.InternalRow = [0,1000000005,6b6563614a]

// Low-level "show"
scala> ltseRDD.foreach(println)

// High-level show
scala> names.show
| name|
Table 1. LocalTableScanExec’s Performance Metrics
Key Name (in web UI) Description


number of output rows


It appears that when no Spark job is used to execute a LocalTableScanExec the numOutputRows metric is not displayed in the web UI.

val names = Seq("Jacek", "Agata").toDF("name")

// The following query gives no numOutputRows metric in web UI's Details for Query (SQL tab)
scala> names.show
| name|

// The query gives numOutputRows metric in web UI's Details for Query (SQL tab)
scala> names.groupBy(length($"name")).count.show
|           5|    2|

// The (type-preserving) query does also give numOutputRows metric in web UI's Details for Query (SQL tab)
scala> names.as[String].map(_.toUpperCase).show

When executed, LocalTableScanExec…​FIXME

spark sql LocalTableScanExec webui query details.png
Figure 1. LocalTableScanExec in web UI (Details for Query)
Table 2. LocalTableScanExec’s Internal Properties
Name Description


Internal binary rows for…​FIXME



Executing Physical Operator (Generating RDD[InternalRow]) — doExecute Method

doExecute(): RDD[InternalRow]
doExecute is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. RDD[InternalRow]).


Creating LocalTableScanExec Instance

LocalTableScanExec takes the following when created:

results matching ""

    No results matching ""