productToRowRdd[A <: Product](data: RDD[A], outputTypes: Seq[DataType]): RDD[InternalRow]
RDDConversions Helper Object
RDDConversions
is a Scala object that is used to productToRowRdd and rowToRowRdd methods.
productToRowRdd
Method
productToRowRdd
…FIXME
Note
|
productToRowRdd is used when…FIXME
|
Converting Scala Objects In Rows to Values Of Catalyst Types — rowToRowRdd
Method
rowToRowRdd(data: RDD[Row], outputTypes: Seq[DataType]): RDD[InternalRow]
rowToRowRdd
maps over partitions of the input RDD[Row]
(using RDD.mapPartitions
operator) that creates a MapPartitionsRDD
with a "map" function.
Tip
|
Use RDD.toDebugString to see the additional MapPartitionsRDD in an RDD lineage.
|
The "map" function takes a Scala Iterator
of Row objects and does the following:
-
Creates a
GenericInternalRow
(of the size that is the number of columns per the inputSeq[DataType]
) -
Creates a converter function for every
DataType
inSeq[DataType]
-
For every Row object in the partition (iterator), applies the converter function per position and adds the result value to the
GenericInternalRow
-
In the end, returns a
GenericInternalRow
for every row
Note
|
rowToRowRdd is used exclusively when DataSourceStrategy execution planning strategy is executed (and requested to toCatalystRDD).
|