AbstractSqlParser — Base SQL Parsing Infrastructure

AbstractSqlParser is the base of ParserInterfaces that use an AstBuilder to parse SQL statements and convert them to Spark SQL entities, i.e. DataType, StructType, Expression, LogicalPlan and TableIdentifier.

AbstractSqlParser is the foundation of the SQL parsing infrastructure.

package org.apache.spark.sql.catalyst.parser

abstract class AbstractSqlParser extends ParserInterface {
  // only required properties (vals and methods) that have no implementation
  // the others follow
  def astBuilder: AstBuilder
}
Table 1. AbstractSqlParser Contract
Method Description

astBuilder

AstBuilder for parsing SQL statements.

Used in all the parse methods, i.e. parseDataType, parseExpression, parsePlan, parseTableIdentifier, and parseTableSchema.

Table 2. AbstractSqlParser’s Implementations
Name Description

SparkSqlParser

The default SQL parser in SessionState available as sqlParser property.

val spark: SparkSession = ...
spark.sessionState.sqlParser

CatalystSqlParser

Creates a DataType or a StructType (schema) from their canonical string representation.

Setting Up SqlBaseLexer and SqlBaseParser for Parsing — parse Method

parse[T](command: String)(toResult: SqlBaseParser => T): T

parse sets up a proper ANTLR parsing infrastructure with SqlBaseLexer and SqlBaseParser (which are the ANTLR-specific classes of Spark SQL that are auto-generated at build time from the SqlBase.g4 grammar).

Tip
Review the definition of ANTLR grammar for Spark SQL in sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4.

Internally, parse first prints out the following INFO message to the logs:

INFO SparkSqlParser: Parsing command: [command]
Tip
Enable INFO logging level for the custom AbstractSqlParser, i.e. SparkSqlParser or CatalystSqlParser, to see the above INFO message.

parse then creates and sets up a SqlBaseLexer and SqlBaseParser that in turn passes the latter on to the input toResult function where the parsing finally happens.

Note
parse uses SLL prediction mode for parsing first before falling back to LL mode.

In case of parsing errors, parse reports a ParseException.

Note
parse is used in all the parse methods, i.e. parseDataType, parseExpression, parsePlan, parseTableIdentifier, and parseTableSchema.

parseDataType Method

parseDataType(sqlText: String): DataType
Note
parseDataType is part of ParserInterface Contract to…​FIXME.

parseDataType…​FIXME

parseExpression Method

parseExpression(sqlText: String): Expression
Note
parseExpression is part of ParserInterface Contract to…​FIXME.

parseExpression…​FIXME

parseFunctionIdentifier Method

parseFunctionIdentifier(sqlText: String): FunctionIdentifier
Note
parseFunctionIdentifier is part of ParserInterface Contract to…​FIXME.

parseFunctionIdentifier…​FIXME

parseTableIdentifier Method

parseTableIdentifier(sqlText: String): TableIdentifier
Note
parseTableIdentifier is part of ParserInterface Contract to…​FIXME.

parseTableIdentifier…​FIXME

parseTableSchema Method

parseTableSchema(sqlText: String): StructType
Note
parseTableSchema is part of ParserInterface Contract to…​FIXME.

parseTableSchema…​FIXME

parsePlan Method

parsePlan(sqlText: String): LogicalPlan
Note
parsePlan is part of ParserInterface Contract to…​FIXME.

parsePlan creates a LogicalPlan for a given SQL textual statement.

Internally, parsePlan builds a SqlBaseParser and requests AstBuilder to parse a single SQL statement.

If a SQL statement could not be parsed, parsePlan throws a ParseException:

Unsupported SQL statement

results matching ""

    No results matching ""