GenerateUnsafeProjection

GenerateUnsafeProjection is a CodeGenerator that generates the bytecode for a UnsafeProjection for given expressions (i.e. CodeGenerator[Seq[Expression], UnsafeProjection]).

GenerateUnsafeProjection: Seq[Expression] => UnsafeProjection
Tip

Enable DEBUG logging level for org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection=DEBUG

Refer to Logging.

Generating UnsafeProjection — generate Method

generate(
  expressions: Seq[Expression],
  subexpressionEliminationEnabled: Boolean): UnsafeProjection

generate canonicalize the input expressions followed by generating a JVM bytecode for a UnsafeProjection for the expressions.

Note

generate is used when:

canonicalize Method

canonicalize(in: Seq[Expression]): Seq[Expression]

canonicalize removes unnecessary Alias expressions.

Internally, canonicalize uses ExpressionCanonicalizer rule executor (that in turn uses just one CleanExpressions expression rule).

Generating JVM Bytecode For UnsafeProjection For Given Expressions (With Optional Subexpression Elimination) — create Method

create(
  expressions: Seq[Expression],
  subexpressionEliminationEnabled: Boolean): UnsafeProjection
create(references: Seq[Expression]): UnsafeProjection (1)
  1. Calls the former create with subexpressionEliminationEnabled flag off

create first creates a CodegenContext and an Java source code for the input expressions.

create creates a code body with public java.lang.Object generate(Object[] references) method that creates a SpecificUnsafeProjection.

public java.lang.Object generate(Object[] references) {
  return new SpecificUnsafeProjection(references);
}

class SpecificUnsafeProjection extends UnsafeProjection {
  ...
}

create creates a CodeAndComment with the code body and comment placeholders.

You should see the following DEBUG message in the logs:

DEBUG GenerateUnsafeProjection: code for [expressions]:
[code]
Tip

Enable DEBUG logging level for org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator logger to see the message above.

log4j.logger.org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator=DEBUG

create requests CodegenContext for references and requests the compiled class to create a SpecificUnsafeProjection for the input references that in the end is the final UnsafeProjection.

Note
(Single-argument) create is part of CodeGenerator Contract.

Creating ExprCode for Expressions (With Optional Subexpression Elimination) — createCode Method

createCode(
  ctx: CodegenContext,
  expressions: Seq[Expression],
  useSubexprElimination: Boolean = false): ExprCode

createCode requests the input CodegenContext to generate a Java source code for code-generated evaluation of every expression in the input expressions.

createCode…​FIXME

import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
val ctx = new CodegenContext

// Use Catalyst DSL
import org.apache.spark.sql.catalyst.dsl.expressions._
val expressions = "hello".expr.as("world") :: "hello".expr.as("world") :: Nil

import org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection
val eval = GenerateUnsafeProjection.createCode(ctx, expressions, useSubexprElimination = true)

scala> println(eval.code)

        mutableStateArray1[0].reset();

        mutableStateArray2[0].write(0, ((UTF8String) references[0] /* literal */));


            mutableStateArray2[0].write(1, ((UTF8String) references[1] /* literal */));
        mutableStateArray[0].setTotalSize(mutableStateArray1[0].totalSize());


scala> println(eval.value)
mutableStateArray[0]
Note

createCode is used when:

results matching ""

    No results matching ""