AstBuilder — ANTLR-based SQL Parser

AstBuilder converts SQL statements into Spark SQL’s relational entities (i.e. data types, Catalyst expressions, logical plans or TableIdentifiers) using visit callback methods.

AstBuilder is the AST builder of AbstractSqlParser (i.e. the base SQL parsing infrastructure in Spark SQL).

Tip

Spark SQL supports SQL statements as described in SqlBase.g4. Using the file can tell you (almost) exactly what Spark SQL supports at any given time.

"Almost" being that although the grammar accepts a SQL statement it can be reported as not allowed by AstBuilder, e.g.

scala> sql("EXPLAIN FORMATTED SELECT * FROM myTable").show
org.apache.spark.sql.catalyst.parser.ParseException:
Operation not allowed: EXPLAIN FORMATTED(line 1, pos 0)

== SQL ==
EXPLAIN FORMATTED SELECT * FROM myTable
^^^

  at org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39)
  at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitExplain$1.apply(SparkSqlParser.scala:275)
  at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitExplain$1.apply(SparkSqlParser.scala:273)
  at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:93)
  at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitExplain(SparkSqlParser.scala:273)
  at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitExplain(SparkSqlParser.scala:53)
  at org.apache.spark.sql.catalyst.parser.SqlBaseParser$ExplainContext.accept(SqlBaseParser.java:480)
  at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42)
  at org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:66)
  at org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:66)
  at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:93)
  at org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:65)
  at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:62)
  at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:61)
  at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:90)
  at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46)
  at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:61)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
  ... 48 elided

AstBuilder is a ANTLR AbstractParseTreeVisitor (as SqlBaseBaseVisitor) that is generated from SqlBase.g4 ANTLR grammar for Spark SQL.

Note

SqlBaseBaseVisitor is a ANTLR-specific base class that is auto-generated at build time from a ANTLR grammar in SqlBase.g4.

SqlBaseBaseVisitor is an ANTLR AbstractParseTreeVisitor.

Table 1. AstBuilder’s Visit Callback Methods
Callback Method ANTLR rule / labeled alternative Spark SQL Entity

visitAliasedQuery

visitColumnReference

visitDereference

visitExists

#exists labeled alternative

Exists expression

visitExplain

explain rule

Note

Can be a OneRowRelation for an EXPLAIN for an unexplainable DescribeTableCommand logical command as created from DESCRIBE TABLE SQL statement.

val q = sql("EXPLAIN DESCRIBE TABLE t")
scala> println(q.queryExecution.logical.numberedTreeString)
scala> println(q.queryExecution.logical.numberedTreeString)
00 ExplainCommand OneRowRelation$, false, false, false

visitFirst

#first labeled alternative

First aggregate function expression

FIRST '(' expression (IGNORE NULLS)? ')'

visitFromClause

fromClause

Supports multiple comma-separated relations (that all together build a condition-less INNER JOIN) with optional LATERAL VIEW.

A relation can be one of the following or a combination thereof:

  • Table identifier

  • Inline table using VALUES exprs AS tableIdent

  • Table-valued function (currently only range is supported)

visitFunctionCall

functionCall labeled alternative

Tip
See the function examples below.

visitInlineTable

inlineTable rule

UnresolvedInlineTable unary logical operator (as the child of SubqueryAlias for tableAlias)

VALUES expression (',' expression)* tableAlias

expression can be as follows:

tableAlias can be specified explicitly or defaults to colN for every column (starting from 1 for N).

visitInsertIntoTable

#insertIntoTable labeled alternative

InsertIntoTable (indirectly)

A 3-element tuple with a TableIdentifier, optional partition keys and the exists flag disabled

INSERT INTO TABLE? tableIdentifier partitionSpec?
Note
insertIntoTable is part of insertInto that is in turn used only as a helper labeled alternative in singleInsertQuery and multiInsertQueryBody rules.

visitInsertOverwriteTable

#insertOverwriteTable labeled alternative

InsertIntoTable (indirectly)

A 3-element tuple with a TableIdentifier, optional partition keys and the exists flag

INSERT OVERWRITE TABLE tableIdentifier (partitionSpec (IF NOT EXISTS)?)?

In a way, visitInsertOverwriteTable is simply a more general version of the visitInsertIntoTable with the exists flag on or off per IF NOT EXISTS used or not. The main difference is that dynamic partitions are used with no IF NOT EXISTS.

Note
insertOverwriteTable is part of insertInto that is in turn used only as a helper labeled alternative in singleInsertQuery and multiInsertQueryBody rules.

visitMultiInsertQuery

multiInsertQueryBody

A logical operator with a InsertIntoTable (and UnresolvedRelation leaf operator)

FROM relation (',' relation)* lateralView*
INSERT OVERWRITE TABLE ...

FROM relation (',' relation)* lateralView*
INSERT INTO TABLE? ...

visitNamedExpression

namedExpression

  • Alias (for a single alias)

  • MultiAlias (for a parenthesis enclosed alias list

  • a bare Expression

visitNamedQuery

SubqueryAlias

visitQuerySpecification

querySpecification

OneRowRelation or LogicalPlan

Note

visitQuerySpecification creates a OneRowRelation for a SELECT without a FROM clause.

val q = sql("select 1")
scala> println(q.queryExecution.logical.numberedTreeString)
00 'Project [unresolvedalias(1, None)]
01 +- OneRowRelation$

visitPredicated

predicated

Expression

visitRelation

relation

LogicalPlan for a FROM clause.

visitRowConstructor

visitSetOperation

setOperation

visitSingleDataType

singleDataType

DataType

visitSingleExpression

singleExpression

Expression

Takes the named expression and relays to visitNamedExpression

visitSingleInsertQuery

#singleInsertQuery labeled alternative

A logical operator with a InsertIntoTable

INSERT INTO TABLE? tableIdentifier partitionSpec? #insertIntoTable

INSERT OVERWRITE TABLE tableIdentifier (partitionSpec (IF NOT EXISTS)?)? #insertOverwriteTable

visitSortItem

sortItem

SortOrder unevaluable unary expression

sortItem
    : expression ordering=(ASC

DESC)? (NULLS nullOrder=(LAST

FIRST))? ;

ORDER BY order+=sortItem (',' order+=sortItem)* SORT BY sort+=sortItem (',' sort+=sortItem)*

(ORDER

SORT) BY sortItem (',' sortItem)*)? `

visitSingleStatement

singleStatement

LogicalPlan from a single statement

Note
A single statement can be quite involved.

visitSingleTableIdentifier

singleTableIdentifier

TableIdentifier

visitStar

#star labeled alternative

UnresolvedStar

visitStruct

visitSubqueryExpression

#subqueryExpression labeled alternative

ScalarSubquery

visitWindowDef

windowDef labeled alternative

'(' CLUSTER BY partition+=expression (',' partition+=expression)*) windowFrame? ')'

'(' ((PARTITION | DISTRIBUTE) BY partition+=expression (',' partition+=expression)*)?
  ((ORDER | SORT) BY sortItem (',' sortItem)*)?)
  windowFrame? ')'
Table 2. AstBuilder’s Parsing Handlers
Parsing Handler LogicalPlan Added

withAggregation

  • GroupingSets for GROUP BY … GROUPING SETS (…)

  • Aggregate for GROUP BY … (WITH CUBE | WITH ROLLUP)?

withGenerate

Generate with a UnresolvedGenerator and join flag turned on for LATERAL VIEW (in SELECT or FROM clauses).

withHints

Hint for /*+ hint */ in SELECT queries.

Tip
Note + (plus) between /* and */

hint is of the format name or name (param1, param2, …​).

/*+ BROADCAST (table) */

withInsertInto

withJoinRelations

Join for a FROM clause and relation alone.

The following join types are supported:

  • INNER (default)

  • CROSS

  • LEFT (with optional OUTER)

  • LEFT SEMI

  • RIGHT (with optional OUTER)

  • FULL (with optional OUTER)

  • ANTI (optionally prefixed with LEFT)

The following join criteria are supported:

  • ON booleanExpression

  • USING '(' identifier (',' identifier)* ')'

Joins can be NATURAL (with no join criteria).

withQueryResultClauses

withQuerySpecification

Adds a query specification to a logical operator.

For transform SELECT (with TRANSFORM, MAP or REDUCE qualifiers), withQuerySpecification does…​FIXME


For regular SELECT (no TRANSFORM, MAP or REDUCE qualifiers), withQuerySpecification adds (in that order):

  1. Generate unary logical operators (if used in the parsed SQL text)

  2. Filter unary logical plan (if used in the parsed SQL text)

  3. GroupingSets or Aggregate unary logical operators (if used in the parsed SQL text)

  4. Project and/or Filter unary logical operators

  5. WithWindowDefinition unary logical operator (if used in the parsed SQL text)

  6. UnresolvedHint unary logical operator (if used in the parsed SQL text)

withPredicate

  • NOT? IN '(' query ')' gives an In predicate expression with a ListQuery subquery expression

  • NOT? IN '(' expression (',' expression)* ')' gives an In predicate expression

withWindows

WithWindowDefinition for window aggregates (given WINDOW definitions).

Used for withQueryResultClauses and withQuerySpecification with windows definition.

WINDOW identifier AS windowSpec
  (',' identifier AS windowSpec)*
Tip
Consult windows, namedWindow, windowSpec, windowFrame, and frameBound (with windowRef and windowDef) ANTLR parsing rules for Spark SQL in SqlBase.g4.
Note
AstBuilder belongs to org.apache.spark.sql.catalyst.parser package.

Function Examples

The examples are handled by visitFunctionCall.

import spark.sessionState.sqlParser

scala> sqlParser.parseExpression("foo()")
res0: org.apache.spark.sql.catalyst.expressions.Expression = 'foo()

scala> sqlParser.parseExpression("foo() OVER windowSpecRef")
res1: org.apache.spark.sql.catalyst.expressions.Expression = unresolvedwindowexpression('foo(), WindowSpecReference(windowSpecRef))

scala> sqlParser.parseExpression("foo() OVER (CLUSTER BY field)")
res2: org.apache.spark.sql.catalyst.expressions.Expression = 'foo() windowspecdefinition('field, UnspecifiedFrame)

aliasPlan Internal Method

aliasPlan(alias: ParserRuleContext, plan: LogicalPlan): LogicalPlan

aliasPlan…​FIXME

Note
aliasPlan is used when…​FIXME

mayApplyAliasPlan Internal Method

mayApplyAliasPlan(tableAlias: TableAliasContext, plan: LogicalPlan): LogicalPlan

mayApplyAliasPlan…​FIXME

Note
mayApplyAliasPlan is used when…​FIXME

results matching ""

    No results matching ""