HiveClientImpl — The One and Only HiveClient

HiveClientImpl is the only available HiveClient in Spark SQL that does/uses…​FIXME

HiveClientImpl is created exclusively when IsolatedClientLoader is requested to create a new Hive client. When created, HiveClientImpl is given the location of the default database for the Hive metastore warehouse (i.e. warehouseDir that is the value of hive.metastore.warehouse.dir Hive-specific Hadoop configuration property).

Note
The location of the default database for the Hive metastore warehouse is /user/hive/warehouse by default.
Note
You may be interested in SPARK-19664 put 'hive.metastore.warehouse.dir' in hadoopConf place if you use Spark before 2.1 (which you should not really as it is not supported anymore).
Note
The Hadoop configuration is what HiveExternalCatalog was given when created (which is the default Hadoop configuration from Spark Core’s SparkContext.hadoopConfiguration with the Spark properties with spark.hadoop prefix).
Tip

Enable DEBUG logging level for org.apache.spark.sql.hive.client.HiveClientImpl logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.sql.hive.client.HiveClientImpl=DEBUG

Refer to Logging.

renamePartitions Method

renamePartitions(
  db: String,
  table: String,
  specs: Seq[TablePartitionSpec],
  newSpecs: Seq[TablePartitionSpec]): Unit
Note
renamePartitions is part of HiveClient Contract to…​FIXME.

renamePartitions…​FIXME

alterPartitions Method

alterPartitions(
  db: String,
  table: String,
  newParts: Seq[CatalogTablePartition]): Unit
Note
alterPartitions is part of HiveClient Contract to…​FIXME.

alterPartitions…​FIXME

client Internal Method

client: Hive

client…​FIXME

Note
client is used…​FIXME

getPartitions Method

getPartitions(
  table: CatalogTable,
  spec: Option[TablePartitionSpec]): Seq[CatalogTablePartition]
Note
getPartitions is part of HiveClient Contract to…​FIXME.

getPartitions…​FIXME

getPartitionsByFilter Method

getPartitionsByFilter(
  table: CatalogTable,
  predicates: Seq[Expression]): Seq[CatalogTablePartition]
Note
getPartitionsByFilter is part of HiveClient Contract to…​FIXME.

getPartitionsByFilter…​FIXME

getPartitionOption Method

getPartitionOption(
  table: CatalogTable,
  spec: TablePartitionSpec): Option[CatalogTablePartition]
Note
getPartitionOption is part of HiveClient Contract to…​FIXME.

getPartitionOption…​FIXME

Creating HiveClientImpl Instance

HiveClientImpl takes the following when created:

  • HiveVersion

  • Location of the default database for the Hive metastore warehouse if defined (aka warehouseDir)

  • SparkConf

  • Hadoop configuration

  • Extra configuration

  • Initial ClassLoader

  • IsolatedClientLoader

HiveClientImpl initializes the internal registries and counters.

Retrieving Table Metadata If Available — getTableOption Method

def getTableOption(dbName: String, tableName: String): Option[CatalogTable]
Note
getTableOption is part of HiveClient Contract to…​FIXME.

When executed, getTableOption prints out the following DEBUG message to the logs:

Looking up [dbName].[tableName]

getTableOption requests Hive client to retrieve the metadata of the table and creates a CatalogTable.

Creating Table Statistics from Hive’s Table or Partition Parameters — readHiveStats Internal Method

readHiveStats(properties: Map[String, String]): Option[CatalogStatistics]

readHiveStats creates a CatalogStatistics from the input Hive table or partition parameters (if available and greater than 0).

Table 1. Table Statistics and Hive Parameters
Hive Parameter Table Statistics

totalSize

sizeInBytes

rawDataSize

sizeInBytes

numRows

rowCount

Note
totalSize Hive parameter has a higher precedence over rawDataSize for sizeInBytes table statistic.
Note
readHiveStats is used when HiveClientImpl is requested for the metadata of a table or table partition.

Retrieving Table Partition Metadata (Converting Table Partition Metadata from Hive Format to Spark SQL Format) — fromHivePartition Method

fromHivePartition(hp: HivePartition): CatalogTablePartition

fromHivePartition simply creates a CatalogTablePartition with the following:

Note
fromHivePartition is used when HiveClientImpl is requested for getPartitionOption, getPartitions and getPartitionsByFilter.

Converting Native Table Metadata to Hive’s Table — toHiveTable Method

toHiveTable(table: CatalogTable, userName: Option[String] = None): HiveTable

toHiveTable simply creates a new Hive Table and copies the properties from the input CatalogTable.

Note

toHiveTable is used when:

getSparkSQLDataType Internal Utility

getSparkSQLDataType(hc: FieldSchema): DataType

getSparkSQLDataType…​FIXME

Note
getSparkSQLDataType is used when…​FIXME

results matching ""

    No results matching ""