val pairsRDD = sc.parallelize((0, "zero") :: (1, "one") :: (2, "two") :: Nil)
// A tuple of Int and String is a product type
scala> :type pairsRDD
org.apache.spark.rdd.RDD[(Int, String)]
val pairsDF = spark.createDataFrame(pairsRDD)
// ExternalRDD represents the pairsRDD in the query plan
val logicalPlan = pairsDF.queryExecution.logical
scala> println(logicalPlan.numberedTreeString)
00 SerializeFromObject [assertnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1 AS _1#10, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, assertnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2, true, false) AS _2#11]
01 +- ExternalRDD [obj#9]
ExternalRDD
ExternalRDD
is a leaf logical operator that is a logical representation of (the data from) an RDD in a logical query plan.
ExternalRDD
is created when:
-
SparkSession
is requested to create a DataFrame from RDD of product types (e.g. Scala case classes, tuples) or Dataset from RDD of a given type -
ExternalRDD
is requested to create a new instance
ExternalRDD
is a MultiInstanceRelation and a ObjectProducer
.
Note
|
ExternalRDD is resolved to ExternalRDDScanExec when BasicOperators execution planning strategy is executed.
|
newInstance
Method
newInstance(): LogicalRDD.this.type
Note
|
newInstance is part of MultiInstanceRelation Contract to…FIXME.
|
newInstance
…FIXME
Computing Statistics — computeStats
Method
computeStats(): Statistics
Note
|
computeStats is part of LeafNode Contract to compute statistics for cost-based optimizer.
|
computeStats
…FIXME
Creating ExternalRDD — apply
Factory Method
apply[T: Encoder](rdd: RDD[T], session: SparkSession): LogicalPlan
apply
…FIXME
Note
|
apply is used when SparkSession is requested to create a DataFrame from RDD of product types (e.g. Scala case classes, tuples) or Dataset from RDD of a given type.
|