Generator

Generator is a contract for structured query expressions that produce zero or more records given a single input record.

Note

You can only have one generator per select clause that is enforced by ExtractGenerator in Analyzer, e.g.

scala> xys.select(explode($"xs"), explode($"ys")).show
org.apache.spark.sql.AnalysisException: Only one generator allowed per select clause but found 2: explode(xs), explode(ys);
  at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator$$anonfun$apply$20.applyOrElse(Analyzer.scala:1670)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator$$anonfun$apply$20.applyOrElse(Analyzer.scala:1662)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62)

If you want to have more than one generator in a structured query you should use LATERAL VIEW which is supported in SQL only, e.g.

val arrayTuple = (Array(1,2,3), Array("a","b","c"))
val ncs = Seq(arrayTuple).toDF("ns", "cs")

scala> ncs.show
+---------+---------+
|       ns|       cs|
+---------+---------+
|[1, 2, 3]|[a, b, c]|
+---------+---------+

scala> ncs.createOrReplaceTempView("ncs")

val q = """
  SELECT n, c FROM ncs
  LATERAL VIEW explode(ns) nsExpl AS n
  LATERAL VIEW explode(cs) csExpl AS c
"""

scala> sql(q).show
+---+---+
|  n|  c|
+---+---+
|  1|  a|
|  1|  b|
|  1|  c|
|  2|  a|
|  2|  b|
|  2|  c|
|  3|  a|
|  3|  b|
|  3|  c|
+---+---+
Table 1. Generators (in alphabetical order)
Name Description

CollectionGenerator

ExplodeBase

Explode

GeneratorOuter

HiveGenericUDTF

Inline

Corresponds to inline and inline_outer functions.

JsonTuple

PosExplode

Stack

UnresolvedGenerator

Represents an unresolved generator.

Created when AstBuilder creates Generate for LATERAL VIEW that corresponds to the following:

LATERAL VIEW (OUTER)?
generatorFunctionName (arg1, arg2, ...)
tblName
AS? col1, col2, ...
Note
UnresolvedGenerator is resolved to Generator by ResolveFunctions (in Analyzer).

UserDefinedGenerator

Explode Generator Unary Expression

Explode is a unary expression that produces a sequence of records for each value in the array or map.

Explode is a result of executing explode function (in SQL and functions)

scala> sql("SELECT explode(array(10,20))").explain
== Physical Plan ==
Generate explode([10,20]), false, false, [col#68]
+- Scan OneRowRelation[]

scala> sql("SELECT explode(array(10,20))").queryExecution.optimizedPlan.expressions(0)
res18: org.apache.spark.sql.catalyst.expressions.Expression = explode([10,20])

val arrayDF = Seq(Array(0,1)).toDF("array")
scala> arrayDF.withColumn("num", explode('array)).explain
== Physical Plan ==
Generate explode(array#93), true, false, [array#93, num#102]
+- LocalTableScan [array#93]

PosExplode

Caution
FIXME

ExplodeBase Unary Expression

ExplodeBase is the base class for Explode and PosExplode.

ExplodeBase is UnaryExpression and Generator with CodegenFallback.

results matching ""

    No results matching ""