package org.apache.spark.sql.execution
trait DataSourceScanExec extends LeafExecNode with CodegenSupport {
// only required vals and methods that have no implementation
// the others follow
def metadata: Map[String, String]
val relation: BaseRelation
val tableIdentifier: Option[TableIdentifier]
}
DataSourceScanExec Contract — Leaf Physical Operators to Scan Over BaseRelation
DataSourceScanExec
is the contract of leaf physical operators that represent scans over BaseRelation.
Note
|
There are two DataSourceScanExecs, i.e. FileSourceScanExec and RowDataSourceScanExec, with a scan over data in HadoopFsRelation and generic BaseRelation relations, respectively. |
DataSourceScanExec
supports Java code generation (aka codegen)
Property | Description |
---|---|
|
Metadata (as a collection of key-value pairs) that describes the scan when requested for the simple text representation. |
|
BaseRelation that is used in the node name and…FIXME |
|
Note
|
The prefix for variable names for DataSourceScanExec operators in a generated Java source code is scan.
|
The default node name prefix is an empty string (that is used in the simple node description).
DataSourceScanExec
uses the BaseRelation and the TableIdentifier as the node name in the following format:
Scan [relation] [tableIdentifier]
DataSourceScanExec | Description |
---|---|
Simple (Basic) Text Node Description (in Query Plan Tree) — simpleString
Method
simpleString: String
Note
|
simpleString is part of QueryPlan Contract to give the simple text description of a TreeNode in a query plan tree.
|
simpleString
creates a text representation of every key-value entry in the metadata…FIXME
Internally, simpleString
sorts the metadata and concatenate the keys and the values (separated by the : `). While doing so, `simpleString
redacts sensitive information in every value and abbreviates it to the first 100 characters.
simpleString
uses Spark Core’s Utils
to truncatedString
.
In the end, simpleString
returns a text representation that is made up of the nodeNamePrefix, the nodeName, the output (schema attributes) and the metadata and is of the following format:
[nodeNamePrefix][nodeName][[output]][metadata]
val scanExec = basicDataSourceScanExec
scala> println(scanExec.simpleString)
Scan $line143.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anon$1@57d94b26 [] PushedFilters: [], ReadSchema: struct<>
def basicDataSourceScanExec = {
import org.apache.spark.sql.catalyst.expressions.AttributeReference
val output = Seq.empty[AttributeReference]
val requiredColumnsIndex = output.indices
import org.apache.spark.sql.sources.Filter
val filters, handledFilters = Set.empty[Filter]
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.catalyst.expressions.UnsafeRow
val row: InternalRow = new UnsafeRow(0)
val rdd: RDD[InternalRow] = sc.parallelize(row :: Nil)
import org.apache.spark.sql.sources.{BaseRelation, TableScan}
val baseRelation: BaseRelation = new BaseRelation with TableScan {
import org.apache.spark.sql.SQLContext
val sqlContext: SQLContext = spark.sqlContext
import org.apache.spark.sql.types.StructType
val schema: StructType = new StructType()
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.Row
def buildScan(): RDD[Row] = ???
}
val tableIdentifier = None
import org.apache.spark.sql.execution.RowDataSourceScanExec
RowDataSourceScanExec(
output, requiredColumnsIndex, filters, handledFilters, rdd, baseRelation, tableIdentifier)
}
verboseString
Method
verboseString: String
Note
|
verboseString is part of QueryPlan Contract to…FIXME.
|
verboseString
simply returns the redacted sensitive information in verboseString (of the parent QueryPlan
).
Text Representation of All Nodes in Tree — treeString
Method
treeString(verbose: Boolean, addSuffix: Boolean): String
Note
|
treeString is part of TreeNode Contract to…FIXME.
|
treeString
simply returns the redacted sensitive information in the text representation of all nodes (in query plan tree) (of the parent TreeNode
).