AstBuilder — ANTLR-based SQL Parser

AstBuilder converts a SQL string into Spark SQL’s corresponding entity (i.e. DataType, Expression, LogicalPlan or TableIdentifier) using visit callback methods.

AstBuilder is the AST builder of AbstractSqlParser (i.e. the base SQL parsing infrastructure in Spark SQL).

Tip

Spark SQL supports SQL queries as described in SqlBase.g4. Using the file can tell you (almost) exactly what Spark SQL supports at any given time.

"Almost" being that although the grammar accepts a SQL statement it can be reported as not allowed by AstBuilder, e.g.

scala> sql("EXPLAIN FORMATTED SELECT * FROM myTable").show
org.apache.spark.sql.catalyst.parser.ParseException:
Operation not allowed: EXPLAIN FORMATTED(line 1, pos 0)

== SQL ==
EXPLAIN FORMATTED SELECT * FROM myTable
^^^

  at org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39)
  at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitExplain$1.apply(SparkSqlParser.scala:275)
  at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitExplain$1.apply(SparkSqlParser.scala:273)
  at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:93)
  at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitExplain(SparkSqlParser.scala:273)
  at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitExplain(SparkSqlParser.scala:53)
  at org.apache.spark.sql.catalyst.parser.SqlBaseParser$ExplainContext.accept(SqlBaseParser.java:480)
  at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42)
  at org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:66)
  at org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:66)
  at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:93)
  at org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:65)
  at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:62)
  at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:61)
  at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:90)
  at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46)
  at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:61)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
  ... 48 elided
Note

Technically, AstBuilder is a ANTLR AbstractParseTreeVisitor (as SqlBaseBaseVisitor) that is generated from SqlBase.g4 ANTLR grammar for Spark SQL.

SqlBaseBaseVisitor is a ANTLR-specific base class that is auto-generated at build time from a ANTLR grammar in SqlBase.g4.

SqlBaseBaseVisitor is an ANTLR AbstractParseTreeVisitor.

Table 1. AstBuilder’s Visit Callback Methods (in alphabetical order)
Callback Method ANTLR rule / labeled alternative Spark SQL Entity

visitDescribeTable

describeTable

DescribeTableCommand logical command for DESCRIBE TABLE

visitExplain

explain

Note

Can be a OneRowRelation for a EXPLAIN for unexplainable DescribeTableCommand logical command as created from SQL’s DESCRIBE TABLE.

val q = sql("EXPLAIN DESCRIBE TABLE t")
scala> println(q.queryExecution.logical.numberedTreeString)
scala> println(q.queryExecution.logical.numberedTreeString)
00 ExplainCommand OneRowRelation$, false, false, false

visitFromClause

fromClause

Supports multiple comma-separated relations (that all together build a condition-less INNER JOIN) with optional LATERAL VIEW.

A relation can be one of the following or a combination thereof:

  • Table identifier

  • Inline table using VALUES exprs AS tableIdent

  • Table-valued function (currently only range is supported)

visitFunctionCall

functionCall labeled alternative

  • UnresolvedFunction for a bare function (with no window specification)

  • UnresolvedWindowExpression for a function evaluated in a windowed context with a WindowSpecReference

  • WindowExpression for a function over a window

Tip
See the function examples below.

visitNamedExpression

namedExpression

  • Alias (for a single alias)

  • MultiAlias (for a parenthesis enclosed alias list

  • a bare Expression

visitQuerySpecification

querySpecification

OneRowRelation or LogicalPlan

Note

visitQuerySpecification creates a OneRowRelation for a SELECT without a FROM clause.

val q = sql("select 1")
scala> println(q.queryExecution.logical.numberedTreeString)
00 'Project [unresolvedalias(1, None)]
01 +- OneRowRelation$

visitRelation

relation

LogicalPlan for a FROM clause.

visitSingleDataType

singleDataType

DataType

visitSingleExpression

singleExpression

Expression

Takes the named expression and relays to visitNamedExpression

visitSingleStatement

singleStatement

LogicalPlan from a single statement

Note
A single statement can be quite involved.

visitSingleTableIdentifier

singleTableIdentifier

TableIdentifier

visitWindowDef

windowDef labeled alternative

'(' CLUSTER BY partition+=expression (',' partition+=expression)*) windowFrame? ')'

'(' ((PARTITION | DISTRIBUTE) BY partition+=expression (',' partition+=expression)*)?
  ((ORDER | SORT) BY sortItem (',' sortItem)*)?)
  windowFrame? ')'
Table 2. AstBuilder’s Parsing Handlers (in alphabetical order)
Parsing Handler LogicalPlan Added

withAggregation

  • GroupingSets for GROUP BY … GROUPING SETS (…)

  • Aggregate for GROUP BY … (WITH CUBE | WITH ROLLUP)?

withGenerate

Generate with UnresolvedGenerator and join flag turned on for LATERAL VIEW (in SELECT or FROM clauses).

withHints

Hint for /*+ hint */ in SELECT.

Tip
Note + (plus) between /* and */

hint is of the format name or name (params) with name as BROADCAST, BROADCASTJOIN or MAPJOIN.

/*+ BROADCAST (table) */

withJoinRelations

Join for a FROM clause and relation alone.

The following join types are supported:

  • INNER (default)

  • CROSS

  • LEFT (with optional OUTER)

  • LEFT SEMI

  • RIGHT (with optional OUTER)

  • FULL (with optional OUTER)

  • ANTI (optionally prefixed with LEFT)

The following join criteria are supported:

  • ON booleanExpression

  • USING '(' identifier (',' identifier)* ')'

Joins can be NATURAL (with no join criteria).

withQueryResultClauses

withQuerySpecification

Adds a query specification to a logical plan.

For transform SELECT (with TRANSFORM, MAP or REDUCE qualifiers), withQuerySpecification does…​FIXME

---

For regular SELECT (no TRANSFORM, MAP or REDUCE qualifiers), withQuerySpecification adds (in that order):

1. Generate unary logical operators if used

1. Filter unary logical plan if used

1. GroupingSets or Aggregate unary logical operators if used

1. Project and/or Filter unary logical operators

1. WithWindowDefinition unary logical operator if used

1. UnresolvedHint unary logical operator if used

withWindows

WithWindowDefinition for window aggregates (given WINDOW definitions).

Used for withQueryResultClauses and withQuerySpecification with windows definition.

WINDOW identifier AS windowSpec
  (',' identifier AS windowSpec)*
Tip
Consult windows, namedWindow, windowSpec, windowFrame, and frameBound (with windowRef and windowDef) ANTLR parsing rules for Spark SQL in SqlBase.g4.
Note
AstBuilder belongs to org.apache.spark.sql.catalyst.parser package.

Function Examples

The examples are handled by visitFunctionCall.

import spark.sessionState.sqlParser

scala> sqlParser.parseExpression("foo()")
res0: org.apache.spark.sql.catalyst.expressions.Expression = 'foo()

scala> sqlParser.parseExpression("foo() OVER windowSpecRef")
res1: org.apache.spark.sql.catalyst.expressions.Expression = unresolvedwindowexpression('foo(), WindowSpecReference(windowSpecRef))

scala> sqlParser.parseExpression("foo() OVER (CLUSTER BY field)")
res2: org.apache.spark.sql.catalyst.expressions.Expression = 'foo() windowspecdefinition('field, UnspecifiedFrame)

results matching ""

    No results matching ""