Builder — Building SparkSession using Fluent API

Builder is the fluent API to create a SparkSession.

Table 1. Builder API
Method Description

appName

appName(name: String): Builder

config

config(conf: SparkConf): Builder
config(key: String, value: Boolean): Builder
config(key: String, value: Double): Builder
config(key: String, value: Long): Builder
config(key: String, value: String): Builder

enableHiveSupport

Enables Hive support

enableHiveSupport(): Builder

getOrCreate

Gets the current SparkSession or creates a new one.

getOrCreate(): SparkSession

master

master(master: String): Builder

withExtensions

Access to the SparkSessionExtensions

withExtensions(f: SparkSessionExtensions => Unit): Builder

Builder is available using the builder object method of a SparkSession.

import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder
  .appName("My Spark Application")  // optional and will be autogenerated if not specified
  .master("local[*]")               // only for demo and testing purposes, use spark-submit instead
  .enableHiveSupport()              // self-explanatory, isn't it?
  .config("spark.sql.warehouse.dir", "target/spark-warehouse")
  .withExtensions { extensions =>
    extensions.injectResolutionRule { session =>
      ...
    }
    extensions.injectOptimizerRule { session =>
      ...
    }
  }
  .getOrCreate
Note
You can have multiple SparkSessions in a single Spark application for different data catalogs (through relational entities).
Table 2. Builder’s Internal Properties (e.g. Registries, Counters and Flags)
Name Description

extensions

SparkSessionExtensions

Used when…​FIXME

options

Used when…​FIXME

Getting Or Creating SparkSession Instance — getOrCreate Method

getOrCreate(): SparkSession

getOrCreate…​FIXME

Enabling Hive Support — enableHiveSupport Method

enableHiveSupport(): Builder

enableHiveSupport enables Hive support (that allows running structured queries on Hive tables, a persistent Hive metastore, support for Hive serdes and user-defined functions).

Note

You do not need any existing Hive installation to use Spark’s Hive support. SparkSession context will automatically create metastore_db in the current directory of a Spark application and the directory configured by spark.sql.warehouse.dir configuration property.

Refer to SharedState.

Internally, enableHiveSupport checks whether the Hive classes are available or not. If so, enableHiveSupport sets spark.sql.catalogImplementation internal configuration property to hive. Otherwise, enableHiveSupport throws an IllegalArgumentException:

Unable to instantiate SparkSession with Hive support because Hive classes are not found.

hiveClassesArePresent Method

hiveClassesArePresent: Boolean

hiveClassesArePresent loads and initializes org.apache.spark.sql.hive.HiveSessionStateBuilder and org.apache.hadoop.hive.conf.HiveConf classes from the current classloader.

hiveClassesArePresent returns true when the initialization succeeded, and false otherwise (due to ClassNotFoundException or NoClassDefFoundError errors).

Note

hiveClassesArePresent is used when:

withExtensions Method

withExtensions(f: SparkSessionExtensions => Unit): Builder

withExtensions simply executes the input f function with the SparkSessionExtensions.

appName Method

appName(name: String): Builder

appName…​FIXME

config Method

config(conf: SparkConf): Builder
config(key: String, value: Boolean): Builder
config(key: String, value: Double): Builder
config(key: String, value: Long): Builder
config(key: String, value: String): Builder

config…​FIXME

master Method

master(master: String): Builder

master…​FIXME

results matching ""

    No results matching ""