RowEncoder — Encoder for DataFrames

RowEncoder is part of the Encoder framework and acts as the encoder for DataFrames, i.e. Dataset[Row] — Datasets of Rows.

Note
DataFrame type is a mere type alias for Dataset[Row] that expects a Encoder[Row] available in scope which is indeed RowEncoder itself.

RowEncoder is an object in Scala with apply and other factory methods.

RowEncoder can create ExpressionEncoder[Row] from a schema (using apply method).

import org.apache.spark.sql.types._
val schema = StructType(
  StructField("id", LongType, nullable = false) ::
  StructField("name", StringType, nullable = false) :: Nil)

import org.apache.spark.sql.catalyst.encoders.RowEncoder
scala> val encoder = RowEncoder(schema)
encoder: org.apache.spark.sql.catalyst.encoders.ExpressionEncoder[org.apache.spark.sql.Row] = class[id[0]: bigint, name[0]: string]

// RowEncoder is never flat
scala> encoder.flat
res0: Boolean = false

RowEncoder object belongs to org.apache.spark.sql.catalyst.encoders package.

Creating ExpressionEncoder For Row Type — apply method

apply(schema: StructType): ExpressionEncoder[Row]

apply builds ExpressionEncoder of Row, i.e. ExpressionEncoder[Row], from the input StructType (as schema).

Internally, apply creates a BoundReference for the Row type and returns a ExpressionEncoder[Row] for the input schema, a CreateNamedStruct serializer (using serializerFor internal method), a deserializer for the schema, and the Row type.

serializerFor Internal Method

serializerFor(inputObject: Expression, inputType: DataType): Expression

serializerFor creates an Expression that is assumed to be CreateNamedStruct.

serializerFor takes the input inputType and:

  1. Returns the input inputObject as is for native types, i.e. NullType, BooleanType, ByteType, ShortType, IntegerType, LongType, FloatType, DoubleType, BinaryType, CalendarIntervalType.

    Caution
    FIXME What does being native type mean?
  2. For UserDefinedTypes, it takes the UDT class from the SQLUserDefinedType annotation or UDTRegistration object and returns an expression with Invoke to call serialize method on a NewInstance of the UDT class.

  3. For TimestampType, it returns an expression with a StaticInvoke to call fromJavaTimestamp on DateTimeUtils class.

  4. …​FIXME

Caution
FIXME Describe me.

results matching ""

    No results matching ""