Powered by GitBook

Configuration Properties

This page contains the configuration properties of the Hive data source.

Table 1. Hive-Specific Spark SQL Configuration Properties
Configuration Property
spark.sql.hive.convertMetastoreOrc Controls whether to use the built-in ORC reader and writer for Hive tables with the ORC storage format (instead of Hive SerDe). Default: `true`
spark.sql.hive.convertMetastoreParquet Controls whether to use the built-in Parquet reader and writer for Hive tables with the parquet storage format (instead of Hive SerDe). Default: `true` Internally, this property enables RelationConversions logical rule to convert HiveTableRelations to HadoopFsRelation
spark.sql.hive.convertMetastoreParquet.mergeSchema Enables trying to merge possibly different but compatible Parquet schemas in different Parquet data files. Default: `false` This configuration is only effective when spark.sql.hive.convertMetastoreParquet is enabled.
spark.sql.hive.manageFilesourcePartitions Enables metastore partition management for file source tables (filesource partition management). This includes both datasource and converted Hive tables. Default: `true` When enabled (`true`), datasource tables store partition metadata in the Hive metastore, and use the metastore to prune partitions during query planning. Use SQLConf.manageFilesourcePartitions method to access the current value.
spark.sql.hive.metastore.barrierPrefixes Comma-separated list of class prefixes that should explicitly be reloaded for each version of Hive that Spark SQL is communicating with, e.g. Hive UDFs that are declared in a prefix that typically would be shared (i.e. `org.apache.spark.*`) Default: `(empty)`
spark.sql.hive.metastore.jars Location of the jars that should be used to create a HiveClientImpl. Default: `builtin` Supported locations: `builtin` - the jars that were used to load Spark SQL (aka Spark classes). Valid only when using the execution version of Hive, i.e. spark.sql.hive.metastore.version `maven` - download the Hive jars from Maven repositories Classpath in the standard format for both Hive and Hadoop
spark.sql.hive.metastore.sharedPrefixes Comma-separated list of class prefixes that should be loaded using the classloader that is shared between Spark SQL and a specific version of Hive. Default: `"com.mysql.jdbc", "org.postgresql", "com.microsoft.sqlserver", "oracle.jdbc"` An example of classes that should be shared are: JDBC drivers that are needed to talk to the metastore Other classes that interact with classes that are already shared, e.g. custom appenders that are used by log4j
spark.sql.hive.metastore.version Version of the Hive metastore (and the client classes and jars). Default: 1.2.1 Supported versions range from 0.12.0 up to and including 2.3.3
spark.sql.hive.verifyPartitionPath When enabled (`true`), check all the partition paths under the table’s root directory when reading data stored in HDFS. This configuration will be deprecated in the future releases and replaced by spark.files.ignoreMissingFiles. Default: `false`
spark.sql.hive.metastorePartitionPruning When enabled (`true`), some predicates will be pushed down into the Hive metastore so that unmatching partitions can be eliminated earlier. Default: `true` This only affects Hive tables that are not converted to filesource relations (based on spark.sql.hive.convertMetastoreParquet and spark.sql.hive.convertMetastoreOrc properties). Use SQLConf.metastorePartitionPruning method to access the current value.
spark.sql.hive.filesourcePartitionFileCacheSize
spark.sql.hive.caseSensitiveInferenceMode
spark.sql.hive.convertCTAS
spark.sql.hive.gatherFastStats
spark.sql.hive.advancedPartitionPredicatePushdown.enabled

results matching ""

No results matching ""