ExternalRDD is a leaf logical operator that is a logical representation of (the data from) an RDD in a logical query plan.

ExternalRDD is created when:

val pairsRDD = sc.parallelize((0, "zero") :: (1, "one") :: (2, "two") :: Nil)

// A tuple of Int and String is a product type
scala> :type pairsRDD
org.apache.spark.rdd.RDD[(Int, String)]

val pairsDF = spark.createDataFrame(pairsRDD)

// ExternalRDD represents the pairsRDD in the query plan
val logicalPlan = pairsDF.queryExecution.logical
scala> println(logicalPlan.numberedTreeString)
00 SerializeFromObject [assertnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1 AS _1#10, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, assertnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2, true, false) AS _2#11]
01 +- ExternalRDD [obj#9]

ExternalRDD is a MultiInstanceRelation and a ObjectProducer.

ExternalRDD is resolved to ExternalRDDScanExec when BasicOperators execution planning strategy is executed.

newInstance Method

newInstance(): LogicalRDD.this.type
newInstance is part of MultiInstanceRelation Contract to…​FIXME.


Computing Statistics — computeStats Method

computeStats(): Statistics
computeStats is part of LeafNode Contract to compute statistics for cost-based optimizer.


Creating ExternalRDD Instance

ExternalRDD takes the following when created:

Creating ExternalRDD — apply Factory Method

apply[T: Encoder](rdd: RDD[T], session: SparkSession): LogicalPlan


apply is used when SparkSession is requested to create a DataFrame from RDD of product types (e.g. Scala case classes, tuples) or Dataset from RDD of a given type.

results matching ""

    No results matching ""