Generator Expression to Generate Zero Or More Rows (aka Lateral Views)

Generator is a contract for Catalyst expressions that can produce zero or more rows given a single input row.

Note
Generator corresponds to SQL’s LATERAL VIEW.

dataType in Generator is simply an ArrayType of elementSchema.

Generator is not foldable and not nullable by default.

Generator supports Java code generation (aka whole-stage codegen) conditionally, i.e. only when a physical operator is not marked as CodegenFallback.

Generator uses terminate to inform that there are no more rows to process, clean up code, and additional rows can be made here.

terminate(): TraversableOnce[InternalRow] = Nil
Table 1. Generators
Name Description

CollectionGenerator

ExplodeBase

Explode

GeneratorOuter

HiveGenericUDTF

Inline

Corresponds to inline and inline_outer functions.

JsonTuple

PosExplode

Stack

UnresolvedGenerator

Represents an unresolved generator.

Created when AstBuilder creates Generate unary logical operator for LATERAL VIEW that corresponds to the following:

LATERAL VIEW (OUTER)?
generatorFunctionName (arg1, arg2, ...)
tblName
AS? col1, col2, ...
Note
UnresolvedGenerator is resolved to Generator by ResolveFunctions logical evaluation rule.

UserDefinedGenerator

Used exclusively in the deprecated explode operator

Note

You can only have one generator per select clause that is enforced by ExtractGenerator logical evaluation rule, e.g.

scala> xys.select(explode($"xs"), explode($"ys")).show
org.apache.spark.sql.AnalysisException: Only one generator allowed per select clause but found 2: explode(xs), explode(ys);
  at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator$$anonfun$apply$20.applyOrElse(Analyzer.scala:1670)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator$$anonfun$apply$20.applyOrElse(Analyzer.scala:1662)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62)

If you want to have more than one generator in a structured query you should use LATERAL VIEW which is supported in SQL only, e.g.

val arrayTuple = (Array(1,2,3), Array("a","b","c"))
val ncs = Seq(arrayTuple).toDF("ns", "cs")

scala> ncs.show
+---------+---------+
|       ns|       cs|
+---------+---------+
|[1, 2, 3]|[a, b, c]|
+---------+---------+

scala> ncs.createOrReplaceTempView("ncs")

val q = """
  SELECT n, c FROM ncs
  LATERAL VIEW explode(ns) nsExpl AS n
  LATERAL VIEW explode(cs) csExpl AS c
"""

scala> sql(q).show
+---+---+
|  n|  c|
+---+---+
|  1|  a|
|  1|  b|
|  1|  c|
|  2|  a|
|  2|  b|
|  2|  c|
|  3|  a|
|  3|  b|
|  3|  c|
+---+---+

Generator Contract

package org.apache.spark.sql.catalyst.expressions

trait Generator extends Expression {
  // only required methods that have no implementation
  def elementSchema: StructType
  def eval(input: InternalRow): TraversableOnce[InternalRow]
}
Table 2. (Subset of) Generator Contract (in alphabetical order)
Method Description

elementSchema

Schema of the elements to be generated

eval

results matching ""

    No results matching ""