RDDConversions Helper Object

RDDConversions is a Scala object that is used to productToRowRdd and rowToRowRdd methods.

productToRowRdd Method

productToRowRdd[A <: Product](data: RDD[A], outputTypes: Seq[DataType]): RDD[InternalRow]

productToRowRdd…​FIXME

Note
productToRowRdd is used when…​FIXME

Converting Scala Objects In Rows to Values Of Catalyst Types — rowToRowRdd Method

rowToRowRdd(data: RDD[Row], outputTypes: Seq[DataType]): RDD[InternalRow]

rowToRowRdd maps over partitions of the input RDD[Row] (using RDD.mapPartitions operator) that creates a MapPartitionsRDD with a "map" function.

Tip
Use RDD.toDebugString to see the additional MapPartitionsRDD in an RDD lineage.

The "map" function takes a Scala Iterator of Row objects and does the following:

  1. Creates a GenericInternalRow (of the size that is the number of columns per the input Seq[DataType])

  2. Creates a converter function for every DataType in Seq[DataType]

  3. For every Row object in the partition (iterator), applies the converter function per position and adds the result value to the GenericInternalRow

  4. In the end, returns a GenericInternalRow for every row

Note
rowToRowRdd is used exclusively when DataSourceStrategy execution planning strategy is executed (and requested to toCatalystRDD).

results matching ""

    No results matching ""