HiveUtils

HiveUtils is an utility that is used to create a HiveClientImpl that HiveExternalCatalog uses to interact with a Hive metastore.

HiveUtils is a Scala object with private[spark] access modifier. Use the following utility to access the properties.

// Use :paste -raw to paste the following code in spark-shell
// BEGIN
package org.apache.spark
import org.apache.spark.sql.hive.HiveUtils
object opener {
  def CONVERT_METASTORE_PARQUET = HiveUtils.CONVERT_METASTORE_PARQUET
}
// END

import org.apache.spark.opener
spark.sessionState.conf.getConf(opener.CONVERT_METASTORE_PARQUET)
Tip

Enable ALL logging level for org.apache.spark.sql.hive.HiveUtils$ logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.sql.hive.HiveUtils=ALL

Refer to Logging.

builtinHiveVersion Property

builtinHiveVersion: String = "1.2.1"
Note

builtinHiveVersion is used when:

Creating HiveClientImpl — newClientForMetadata Method

newClientForMetadata(
  conf: SparkConf,
  hadoopConf: Configuration): HiveClient  (1)
newClientForMetadata(
  conf: SparkConf,
  hadoopConf: Configuration,
  configurations: Map[String, String]): HiveClient
  1. Uses time configurations formatted

Internally, newClientForMetadata creates a new SQLConf with spark.sql properties only (from the input SparkConf).

newClientForMetadata then creates an IsolatedClientLoader per the input parameters and the following configuration properties:

You should see one of the following INFO messages in the logs:

Initializing HiveMetastoreConnection version [hiveMetastoreVersion] using Spark classes.
Initializing HiveMetastoreConnection version [hiveMetastoreVersion] using maven.
Initializing HiveMetastoreConnection version [hiveMetastoreVersion] using [jars]

In the end, newClientForMetadata requests the IsolatedClientLoader for a HiveClient.

Note
newClientForMetadata is used when HiveExternalCatalog is requested for a HiveClient.

newClientForExecution Utility

newClientForExecution(
  conf: SparkConf,
  hadoopConf: Configuration): HiveClientImpl

newClientForExecution…​FIXME

Note
newClientForExecution is used for HiveThriftServer2.

inferSchema Method

inferSchema(
  table: CatalogTable): CatalogTable

inferSchema…​FIXME

Note
inferSchema is used when ResolveHiveSerdeTable logical resolution rule is executed.

withHiveExternalCatalog Utility

withHiveExternalCatalog(
  sc: SparkContext): SparkContext

withHiveExternalCatalog simply sets the spark.sql.catalogImplementation configuration property to hive for the input SparkContext.

Note
withHiveExternalCatalog is used when the deprecated HiveContext is created.

results matching ""

    No results matching ""