SessionState — State Separation Layer Between SparkSessions

SessionState is the state separation layer between Spark SQL sessions, including SQL configuration, tables, functions, UDFs, SQL parser, and everything else that depends on a SQLConf.

SessionState is available as the sessionState property of a SparkSession.

scala> :type spark
org.apache.spark.sql.SparkSession

scala> :type spark.sessionState
org.apache.spark.sql.internal.SessionState

SessionState is created when SparkSession is requested to instantiateSessionState (when requested for the SessionState per spark.sql.catalogImplementation configuration property).

Figure 1. Creating SessionState

Note

When requested for the SessionState, SparkSession uses spark.sql.catalogImplementation configuration property to load and create a BaseSessionStateBuilder that is then requested to create a SessionState instance.

There are two BaseSessionStateBuilders available:

(default) SessionStateBuilder for in-memory catalog
HiveSessionStateBuilder for hive catalog

hive catalog is set when the SparkSession was created with the Hive support enabled (using Builder.enableHiveSupport).

Table 1. SessionState’s (Lazily-Initialized) Attributes
Name	Type	Description
`analyzer`	Analyzer	Spark Analyzer Initialized lazily (i.e. only when requested the first time) using the analyzerBuilder factory function. Used when…FIXME
`catalog`	SessionCatalog	Metastore of tables and databases Used when…FIXME
`conf`	SQLConf	FIXME Used when…FIXME
`experimentalMethods`	ExperimentalMethods	FIXME Used when…FIXME
`functionRegistry`	FunctionRegistry	FIXME Used when…FIXME
`listenerManager`	ExecutionListenerManager	FIXME Used when…FIXME
`optimizer`	Optimizer	Logical query plan optimizer Used exclusively when `QueryExecution` creates an optimized logical plan.
`resourceLoader`	`SessionResourceLoader`	FIXME Used when…FIXME
`sqlParser`	ParserInterface	FIXME Used when…FIXME
`streamingQueryManager`	`StreamingQueryManager`	Used to manage streaming queries in Spark Structured Streaming
`udfRegistration`	UDFRegistration	Interface to register user-defined functions. Used when…FIXME

Note	`SessionState` is a `private[sql]` class and, given the package `org.apache.spark.sql.internal`, `SessionState` should be considered internal.

Creating SessionState Instance

SessionState takes the following when created:

SharedState
SQLConf
ExperimentalMethods
FunctionRegistry
UDFRegistration
catalogBuilder function to create a SessionCatalog (i.e. () ⇒ SessionCatalog)
ParserInterface
analyzerBuilder function to create an Analyzer (i.e. () ⇒ Analyzer)
optimizerBuilder function to create an Optimizer (i.e. () ⇒ Optimizer)
SparkPlanner
Spark Structured Streaming’s StreamingQueryManager
ExecutionListenerManager
resourceLoaderBuilder function to create a SessionResourceLoader (i.e. () ⇒ SessionResourceLoader)
createQueryExecution function to create a QueryExecution given a LogicalPlan (i.e. LogicalPlan ⇒ QueryExecution)
createClone function to clone the SessionState given a SparkSession (i.e. (SparkSession, SessionState) ⇒ SessionState)

`clone` Method

clone(newSparkSession: SparkSession): SessionState

clone…FIXME

Note	`clone` is used when…

"Executing" Logical Plan (Creating QueryExecution For LogicalPlan) — `executePlan` Method

executePlan(plan: LogicalPlan): QueryExecution

executePlan simply executes the createQueryExecution function on the input logical plan (that simply creates a QueryExecution with the current SparkSession and the input logical plan).

`refreshTable` Method

refreshTable(tableName: String): Unit

refreshTable…FIXME

Note	`refreshTable` is used…FIXME

Creating New Hadoop Configuration — `newHadoopConf` Method

newHadoopConf(): Configuration

newHadoopConf returns a new Hadoop Configuration (with the SparkContext.hadoopConfiguration and all the configuration properties of the SQLConf).

Note	`newHadoopConf` is used by `ScriptTransformation`, `ParquetRelation`, `StateStoreRDD`, and `SessionState` itself, and few other places.

Creating New Hadoop Configuration With Extra Options — `newHadoopConfWithOptions` Method

newHadoopConfWithOptions(options: Map[String, String]): Configuration

newHadoopConfWithOptions creates a new Hadoop Configuration with the input options set (except path and paths options that are skipped).

Note	`newHadoopConfWithOptions` is used when: `TextBasedFileFormat` is requested to say whether it is splitable or not `FileSourceScanExec` is requested for the input RDD `InsertIntoHadoopFsRelationCommand` is requested to run `PartitioningAwareFileIndex` is requested for the Hadoop Configuration

SessionState

SessionState — State Separation Layer Between SparkSessions

Creating SessionState Instance

`clone` Method

"Executing" Logical Plan (Creating QueryExecution For LogicalPlan) — `executePlan` Method

`refreshTable` Method

Creating New Hadoop Configuration — `newHadoopConf` Method

Creating New Hadoop Configuration With Extra Options — `newHadoopConfWithOptions` Method

results matching ""

No results matching ""

SessionState — State Separation Layer Between SparkSessions

Creating SessionState Instance

clone Method

"Executing" Logical Plan (Creating QueryExecution For LogicalPlan) — executePlan Method

refreshTable Method

Creating New Hadoop Configuration — newHadoopConf Method

Creating New Hadoop Configuration With Extra Options — newHadoopConfWithOptions Method

results matching ""

No results matching ""

`clone` Method

"Executing" Logical Plan (Creating QueryExecution For LogicalPlan) — `executePlan` Method

`refreshTable` Method

Creating New Hadoop Configuration — `newHadoopConf` Method

Creating New Hadoop Configuration With Extra Options — `newHadoopConfWithOptions` Method