val names = Seq("Jacek", "Agata").toDF("name")
val optimizedPlan = names.queryExecution.optimizedPlan
scala> println(optimizedPlan.numberedTreeString)
00 LocalRelation [name#9]
// Physical plan with LocalTableScanExec operator (shown as LocalTableScan)
scala> names.explain
== Physical Plan ==
LocalTableScan [name#9]
// Going fairly low-level...you've been warned
val plan = names.queryExecution.executedPlan
import org.apache.spark.sql.execution.LocalTableScanExec
val ltse = plan.asInstanceOf[LocalTableScanExec]
val ltseRDD = ltse.execute()
scala> :type ltseRDD
org.apache.spark.rdd.RDD[org.apache.spark.sql.catalyst.InternalRow]
scala> println(ltseRDD.toDebugString)
(2) MapPartitionsRDD[1] at execute at <console>:30 []
| ParallelCollectionRDD[0] at execute at <console>:30 []
// no computation on the source dataset has really occurred yet
// Let's trigger a RDD action
scala> ltseRDD.first
res6: org.apache.spark.sql.catalyst.InternalRow = [0,1000000005,6b6563614a]
// Low-level "show"
scala> ltseRDD.foreach(println)
[0,1000000005,6b6563614a]
[0,1000000005,6174616741]
// High-level show
scala> names.show
+-----+
| name|
+-----+
|Jacek|
|Agata|
+-----+
LocalTableScanExec Leaf Physical Operator
LocalTableScanExec
is a leaf physical operator (i.e. no children) and producedAttributes
being outputSet
.
LocalTableScanExec
is created when BasicOperators execution planning strategy resolves LocalRelation and Spark Structured Streaming’s MemoryPlan
logical operators.
Tip
|
Read on MemoryPlan logical operator in the Spark Structured Streaming gitbook.
|
Key | Name (in web UI) | Description |
---|---|---|
number of output rows |
Note
|
It appears that when no Spark job is used to execute a
|
When executed, LocalTableScanExec
…FIXME
Figure 1. LocalTableScanExec in web UI (Details for Query)
Name | Description |
---|---|
Internal binary rows for…FIXME |
|
Executing Physical Operator (Generating RDD[InternalRow]) — doExecute
Method
doExecute(): RDD[InternalRow]
Note
|
doExecute is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. RDD[InternalRow] ).
|
doExecute
…FIXME
Creating LocalTableScanExec Instance
LocalTableScanExec
takes the following when created:
-
Output schema attributes