LocalTableScanExec Physical Operator

LocalTableScanExec is a leaf physical operator (i.e. no children) and producedAttributes being outputSet.

LocalTableScanExec is created when BasicOperators execution planning strategy resolves LocalRelation and Spark Structured Streaming’s MemoryPlan logical operators.

Tip
Read on MemoryPlan logical operator in the Spark Structured Streaming gitbook.
val names = Seq("Jacek", "Agata").toDF("name")
val optimizedPlan = names.queryExecution.optimizedPlan

scala> println(optimizedPlan.numberedTreeString)
00 LocalRelation [name#9]

// Physical plan with LocalTableScanExec operator (shown as LocalTableScan)
scala> names.explain
== Physical Plan ==
LocalTableScan [name#9]

// Going fairly low-level...you've been warned

val plan = names.queryExecution.executedPlan
import org.apache.spark.sql.execution.LocalTableScanExec
val ltse = plan.asInstanceOf[LocalTableScanExec]

val ltseRDD = ltse.execute()
scala> :type ltseRDD
org.apache.spark.rdd.RDD[org.apache.spark.sql.catalyst.InternalRow]

scala> println(ltseRDD.toDebugString)
(2) MapPartitionsRDD[1] at execute at <console>:30 []
 |  ParallelCollectionRDD[0] at execute at <console>:30 []

// no computation on the source dataset has really occurred yet
// Let's trigger a RDD action
scala> ltseRDD.first
res6: org.apache.spark.sql.catalyst.InternalRow = [0,1000000005,6b6563614a]

// Low-level "show"
scala> ltseRDD.foreach(println)
[0,1000000005,6b6563614a]
[0,1000000005,6174616741]

// High-level show
scala> names.show
+-----+
| name|
+-----+
|Jacek|
|Agata|
+-----+
Table 1. LocalTableScanExec’s Performance Metrics
Key Name (in web UI) Description

numOutputRows

number of output rows

Note

It appears that when no Spark job is used to execute a LocalTableScanExec the numOutputRows metric is not displayed in the web UI.

val names = Seq("Jacek", "Agata").toDF("name")

// The following query gives no numOutputRows metric in web UI's Details for Query (SQL tab)
scala> names.show
+-----+
| name|
+-----+
|Jacek|
|Agata|
+-----+

// The query gives numOutputRows metric in web UI's Details for Query (SQL tab)
scala> names.groupBy(length($"name")).count.show
+------------+-----+
|length(name)|count|
+------------+-----+
|           5|    2|
+------------+-----+

// The (type-preserving) query does also give numOutputRows metric in web UI's Details for Query (SQL tab)
scala> names.as[String].map(_.toUpperCase).show
+-----+
|value|
+-----+
|JACEK|
|AGATA|
+-----+

When executed, LocalTableScanExec…​FIXME

spark sql LocalTableScanExec webui query details.png
Figure 1. LocalTableScanExec in web UI (Details for Query)
Table 2. LocalTableScanExec’s Internal Properties
Name Description

unsafeRows

Internal binary rows for…​FIXME

numParallelism

rdd

Executing Physical Operator (Generating RDD[InternalRow]) — doExecute Method

doExecute(): RDD[InternalRow]
Note
doExecute is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. RDD[InternalRow]).

doExecute…​FIXME

Creating LocalTableScanExec Instance

LocalTableScanExec takes the following when created:

results matching ""

    No results matching ""