SharedState — State Shared Across SparkSessions

SharedState holds the shared state across multiple SparkSessions.

Table 1. SharedState’s Properties
Name Type Description

cacheManager

CacheManager

externalCatalog

ExternalCatalog

Metastore of permanent relational entities, i.e. databases, tables, partitions, and functions.

Note
externalCatalog is initialized lazily on the first access.

globalTempViewManager

GlobalTempViewManager

Management interface of global temporary views

jarClassLoader

NonClosableMutableURLClassLoader

sparkContext

SparkContext

Spark Core’s SparkContext

statusStore

SQLAppStatusStore

warehousePath

String

Warehouse path

SharedState is available as the sharedState property of a SparkSession.

scala> :type spark
org.apache.spark.sql.SparkSession

scala> :type spark.sharedState
org.apache.spark.sql.internal.SharedState

SharedState is shared across SparkSessions.

scala> spark.newSession.sharedState == spark.sharedState
res1: Boolean = true

SharedState is created exclusively when accessed using sharedState property of SparkSession.

Tip

Enable INFO logging level for org.apache.spark.sql.internal.SharedState logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.sql.internal.SharedState=INFO

Refer to Logging.

warehousePath Property

warehousePath: String

warehousePath is the warehouse path with the value of:

  1. hive.metastore.warehouse.dir if defined and spark.sql.warehouse.dir is not

  2. spark.sql.warehouse.dir if hive.metastore.warehouse.dir is undefined

You should see the following INFO message in the logs when SharedState is created:

INFO Warehouse path is '[warehousePath]'.

warehousePath is used exclusively when SharedState initializes ExternalCatalog (and creates the default database in the metastore).

While initialized, warehousePath does the following:

  1. Loads hive-site.xml if available on CLASSPATH, i.e. adds it as a configuration resource to Hadoop’s Configuration (of SparkContext).

  2. Removes hive.metastore.warehouse.dir from SparkConf (of SparkContext) and leaves it off if defined using any of the Hadoop configuration resources.

  3. Sets spark.sql.warehouse.dir or hive.metastore.warehouse.dir in the Hadoop configuration (of SparkContext)

    1. If hive.metastore.warehouse.dir has been defined in any of the Hadoop configuration resources but spark.sql.warehouse.dir has not, spark.sql.warehouse.dir becomes the value of hive.metastore.warehouse.dir.

      You should see the following INFO message in the logs:

      spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir ('[hiveWarehouseDir]').
    2. Otherwise, the Hadoop configuration’s hive.metastore.warehouse.dir is set to spark.sql.warehouse.dir

      You should see the following INFO message in the logs:

      Setting hive.metastore.warehouse.dir ('[hiveWarehouseDir]') to the value of spark.sql.warehouse.dir ('[sparkWarehouseDir]').

externalCatalog Property

externalCatalog: ExternalCatalog

externalCatalog is created reflectively per spark.sql.catalogImplementation internal configuration property (with the current Hadoop’s Configuration as SparkContext.hadoopConfiguration):

While initialized:

  1. Creates the default database (with default database description and warehousePath location) if it doesn’t exist.

  2. Registers a ExternalCatalogEventListener that propagates external catalog events to the Spark listener bus.

externalCatalogClassName Internal Method

externalCatalogClassName(conf: SparkConf): String

externalCatalogClassName gives the name of the class of the ExternalCatalog per spark.sql.catalogImplementation, i.e.

Note
externalCatalogClassName is used exclusively when SharedState is requested for the ExternalCatalog.

Accessing Management Interface of Global Temporary Views — globalTempViewManager Property

globalTempViewManager: GlobalTempViewManager

When accessed for the very first time, globalTempViewManager gets the name of the global temporary view database (as the value of spark.sql.globalTempDatabase internal static configuration property).

In the end, globalTempViewManager creates a new GlobalTempViewManager (with the database name).

globalTempViewManager throws a SparkException when the global temporary view database exist in the ExternalCatalog.

[globalTempDB] is a system preserved database, please rename your existing database to resolve the name conflict, or set a different value for spark.sql.globalTempDatabase, and launch your Spark application again.
Note
globalTempViewManager is used when BaseSessionStateBuilder and HiveSessionStateBuilder are requested for the SessionCatalog.

results matching ""

    No results matching ""