Builder — Building SparkSession using Fluent API

Builder is the fluent API to build a fully-configured SparkSession.

Table 1. Builder Methods
Method Description

getOrCreate

Gets the current SparkSession or creates a new one.

enableHiveSupport

Enables Hive support

import org.apache.spark.sql.SparkSession
val spark: SparkSession = SparkSession.builder
  .appName("My Spark Application")  // optional and will be autogenerated if not specified
  .master("local[*]")               // avoid hardcoding the deployment environment
  .enableHiveSupport()              // self-explanatory, isn't it?
  .getOrCreate

You can use the fluent design pattern to set the various properties of a SparkSession that opens a session to Spark SQL.

Note
You can have multiple SparkSessions in a single Spark application for different data catalogs (through relational entities).

getOrCreate Method

Caution
FIXME

config Method

Caution
FIXME

Enabling Hive Support — enableHiveSupport Method

When creating a SparkSession, you can optionally enable Hive support using enableHiveSupport method.

enableHiveSupport(): Builder

enableHiveSupport enables Hive support (with connectivity to a persistent Hive metastore, support for Hive serdes, and Hive user-defined functions).

Note

You do not need any existing Hive installation to use Spark’s Hive support. SparkSession context will automatically create metastore_db in the current directory of a Spark application and a directory configured by spark.sql.warehouse.dir.

Refer to SharedState.

Internally, enableHiveSupport makes sure that the Hive classes are on CLASSPATH, i.e. Spark SQL’s org.apache.spark.sql.hive.HiveSessionState and org.apache.hadoop.hive.conf.HiveConf, and sets spark.sql.catalogImplementation property to hive.

results matching ""

    No results matching ""