productToRowRdd[A <: Product](data: RDD[A], outputTypes: Seq[DataType]): RDD[InternalRow]
RDDConversions Helper Object
RDDConversions is a Scala object that is used to productToRowRdd and rowToRowRdd methods.
productToRowRdd Method
productToRowRdd…FIXME
|
Note
|
productToRowRdd is used when…FIXME
|
Converting Scala Objects In Rows to Values Of Catalyst Types — rowToRowRdd Method
rowToRowRdd(data: RDD[Row], outputTypes: Seq[DataType]): RDD[InternalRow]
rowToRowRdd maps over partitions of the input RDD[Row] (using RDD.mapPartitions operator) that creates a MapPartitionsRDD with a "map" function.
|
Tip
|
Use RDD.toDebugString to see the additional MapPartitionsRDD in an RDD lineage.
|
The "map" function takes a Scala Iterator of Row objects and does the following:
-
Creates a
GenericInternalRow(of the size that is the number of columns per the inputSeq[DataType]) -
Creates a converter function for every
DataTypeinSeq[DataType] -
For every Row object in the partition (iterator), applies the converter function per position and adds the result value to the
GenericInternalRow -
In the end, returns a
GenericInternalRowfor every row
|
Note
|
rowToRowRdd is used exclusively when DataSourceStrategy execution planning strategy is executed (and requested to toCatalystRDD).
|