Settings

The following list are the settings used to configure Spark SQL applications.

You can set them in a SparkSession upon instantiation using config method.

import org.apache.spark.sql.SparkSession
val spark: SparkSession = SparkSession.builder
  .master("local[*]")
  .appName("My Spark Application")
  .config("spark.sql.warehouse.dir", "c:/Temp") (1)
  .getOrCreate
  1. Sets spark.sql.warehouse.dir for the Spark SQL session

Table 1. Spark SQL Properties
Name Default Description

spark.sql.catalogImplementation

in-memory

(internal) Selects the active catalog implementation from:

  • in-memory

  • hive

Tip
You can enable Hive support in a SparkSession using enableHiveSupport builder method.

spark.sql.sources.default

parquet

Defines the default data source to use for DataFrameReader.

Used when:

spark.sql.TungstenAggregate.testFallbackStartsAt

(empty)

A comma-separated pair of numbers, e.g. 5,10, that HashAggregateExec uses to inform TungstenAggregationIterator to switch to a sort-based aggregation when the hash-based approach is unable to acquire enough memory.

spark.sql.ui.retainedExecutions

1000

The number of SQLExecutionUIData entries to keep in failedExecutions and completedExecutions internal registries.

When a query execution finishes, the execution is removed from the internal activeExecutions registry and stored in failedExecutions or completedExecutions given the end execution status. It is when SQLListener makes sure that the number of SQLExecutionUIData entires does not exceed spark.sql.ui.retainedExecutions Spark property and removes the excess of entries.

spark.sql.warehouse.dir

spark.sql.warehouse.dir (default: ${system:user.dir}/spark-warehouse) is the default location of Hive warehouse directory (using Derby) with managed databases and tables.

See also the official Hive Metastore Administration document.

spark.sql.parquet.filterPushdown

spark.sql.parquet.filterPushdown (default: true) is a flag to control the filter predicate push-down optimization for data sources using parquet file format.

spark.sql.allowMultipleContexts

spark.sql.allowMultipleContexts (default: true) controls whether creating multiple SQLContexts/HiveContexts is allowed.

spark.sql.columnNameOfCorruptRecord

spark.sql.columnNameOfCorruptRecord…​FIXME

spark.sql.dialect

spark.sql.dialect - FIXME

spark.sql.streaming.checkpointLocation

spark.sql.streaming.checkpointLocation is the default location for storing checkpoint data for continuously executing queries.

results matching ""

    No results matching ""