SessionState — State Separation Layer Between SparkSessions

SessionState is the state separation layer between Spark SQL sessions, including SQL configuration, tables, functions, UDFs, SQL parser, and everything else that depends on a SQLConf.

SessionState is available as the sessionState property of a SparkSession.

scala> :type spark
org.apache.spark.sql.SparkSession

scala> :type spark.sessionState
org.apache.spark.sql.internal.SessionState

SessionState is created when SparkSession is requested to instantiateSessionState (when requested for the SessionState per spark.sql.catalogImplementation configuration property).

spark sql SessionState.png
Figure 1. Creating SessionState
Note

When requested for the SessionState, SparkSession uses spark.sql.catalogImplementation configuration property to load and create a BaseSessionStateBuilder that is then requested to create a SessionState instance.

There are two BaseSessionStateBuilders available:

hive catalog is set when the SparkSession was created with the Hive support enabled (using Builder.enableHiveSupport).

Table 1. SessionState’s (Lazily-Initialized) Attributes
Name Type Description

analyzer

Analyzer

Spark Analyzer

Initialized lazily (i.e. only when requested the first time) using the analyzerBuilder factory function.

Used when…​FIXME

catalog

SessionCatalog

Metastore of tables and databases

Used when…​FIXME

conf

SQLConf

FIXME

Used when…​FIXME

experimentalMethods

ExperimentalMethods

FIXME

Used when…​FIXME

functionRegistry

FunctionRegistry

FIXME

Used when…​FIXME

functionResourceLoader

FunctionResourceLoader

FIXME

Used when…​FIXME

listenerManager

ExecutionListenerManager

FIXME

Used when…​FIXME

optimizer

Optimizer

Logical query plan optimizer

Used exclusively when QueryExecution creates an optimized logical plan.

resourceLoader

SessionResourceLoader

FIXME

Used when…​FIXME

sqlParser

ParserInterface

FIXME

Used when…​FIXME

streamingQueryManager

StreamingQueryManager

Used to manage streaming queries in Spark Structured Streaming

udfRegistration

UDFRegistration

Interface to register user-defined functions.

Used when…​FIXME

Note
SessionState is a private[sql] class and, given the package org.apache.spark.sql.internal, SessionState should be considered internal.

Creating SessionState Instance

SessionState takes the following when created:

apply Factory Methods

Caution
FIXME
apply(sparkSession: SparkSession): SessionState (1)
apply(sparkSession: SparkSession, sqlConf: SQLConf): SessionState
  1. Passes sparkSession to the other apply with a new SQLConf

Note
apply is used when SparkSession is requested for SessionState.

clone Method

Caution
FIXME
Note
clone is used when…​

createAnalyzer Internal Method

createAnalyzer(
  sparkSession: SparkSession,
  catalog: SessionCatalog,
  sqlConf: SQLConf): Analyzer

createAnalyzer creates a logical query plan Analyzer with rules specific to a non-Hive SessionState.

Table 2. Analyzer’s Evaluation Rules for non-Hive SessionState (in the order of execution)
Method Rules Description

extendedResolutionRules

FindDataSourceTable

Replaces InsertIntoTable (with CatalogRelation) and CatalogRelation logical plans with LogicalRelation.

ResolveSQLOnFile

postHocResolutionRules

PreprocessTableCreation

PreprocessTableInsertion

DataSourceAnalysis

extendedCheckRules

PreWriteCheck

HiveOnlyCheck

Note
createAnalyzer is used when SessionState is created or cloned.

"Executing" Logical Plan (Creating QueryExecution For LogicalPlan) — executePlan Method

executePlan(plan: LogicalPlan): QueryExecution

executePlan simply executes the createQueryExecution function on the input logical plan (that simply creates a QueryExecution with the current SparkSession and the input logical plan).

refreshTable Method

refreshTable is…​

addJar Method

addJar is…​

analyze Method

analyze is…​

Creating New Hadoop Configuration — newHadoopConf Method

newHadoopConf(): Configuration
newHadoopConf(hadoopConf: Configuration, sqlConf: SQLConf): Configuration

newHadoopConf returns a Hadoop Configuration (with the SparkContext.hadoopConfiguration and all the configuration properties of the SQLConf).

Note
newHadoopConf is used by ScriptTransformation, ParquetRelation, StateStoreRDD, and SessionState itself, and few other places.

Creating New Hadoop Configuration With Extra Options — newHadoopConfWithOptions Method

newHadoopConfWithOptions(options: Map[String, String]): Configuration

newHadoopConfWithOptions creates a new Hadoop Configuration with the input options set (except path and paths options that are skipped).

Note

newHadoopConfWithOptions is used when:

results matching ""

    No results matching ""