HiveClientImpl

HiveClientImpl is a HiveClient that uses a Hive metastore client (for meta data/DDL operations using calls to a Hive metastore).

HiveClientImpl is created exclusively when IsolatedClientLoader is requested to create a new Hive client. When created, HiveClientImpl is given the location of the default database for the Hive metastore warehouse (i.e. warehouseDir that is the value of hive.metastore.warehouse.dir Hive-specific Hadoop configuration property).

Note
The location of the default database for the Hive metastore warehouse is /user/hive/warehouse by default.
Note
The Hadoop configuration is what HiveExternalCatalog was given when created (which is the default Hadoop configuration from Spark Core’s SparkContext.hadoopConfiguration with the Spark properties with spark.hadoop prefix).
Tip

Enable ALL logging level for org.apache.spark.sql.hive.client.HiveClientImpl logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.sql.hive.client.HiveClientImpl=ALL

Refer to Logging.

Creating HiveClientImpl Instance

HiveClientImpl takes the following to be created:

  • HiveVersion

  • Location of the default database for the Hive metastore warehouse if defined (aka warehouseDir)

  • SparkConf

  • Hadoop configuration

  • Extra configuration

  • Initial ClassLoader

  • IsolatedClientLoader

HiveClientImpl initializes the internal properties.

Hive Metastore Client — client Internal Method

client: Hive

client is a Hive metastore client (for meta data/DDL operations using calls to the metastore).

Retrieving Table Metadata From Hive Metastore — getTableOption Method

getTableOption(
  dbName: String,
  tableName: String): Option[CatalogTable]
Note
getTableOption is part of HiveClient contract.

getTableOption prints out the following DEBUG message to the logs:

Looking up [dbName].[tableName]

getTableOption getRawTableOption and converts the Hive table metadata to Spark’s CatalogTable

renamePartitions Method

renamePartitions(
  db: String,
  table: String,
  specs: Seq[TablePartitionSpec],
  newSpecs: Seq[TablePartitionSpec]): Unit
Note
renamePartitions is part of HiveClient Contract to…​FIXME.

renamePartitions…​FIXME

alterPartitions Method

alterPartitions(
  db: String,
  table: String,
  newParts: Seq[CatalogTablePartition]): Unit
Note
alterPartitions is part of HiveClient Contract to…​FIXME.

alterPartitions…​FIXME

getPartitions Method

getPartitions(
  table: CatalogTable,
  spec: Option[TablePartitionSpec]): Seq[CatalogTablePartition]
Note
getPartitions is part of HiveClient Contract to…​FIXME.

getPartitions…​FIXME

getPartitionsByFilter Method

getPartitionsByFilter(
  table: CatalogTable,
  predicates: Seq[Expression]): Seq[CatalogTablePartition]
Note
getPartitionsByFilter is part of HiveClient Contract to…​FIXME.

getPartitionsByFilter…​FIXME

getPartitionOption Method

getPartitionOption(
  table: CatalogTable,
  spec: TablePartitionSpec): Option[CatalogTablePartition]
Note
getPartitionOption is part of HiveClient Contract to…​FIXME.

getPartitionOption…​FIXME

Creating Table Statistics from Hive’s Table or Partition Parameters — readHiveStats Internal Method

readHiveStats(properties: Map[String, String]): Option[CatalogStatistics]

readHiveStats creates a CatalogStatistics from the input Hive table or partition parameters (if available and greater than 0).

Table 1. Table Statistics and Hive Parameters
Hive Parameter Table Statistics

totalSize

sizeInBytes

rawDataSize

sizeInBytes

numRows

rowCount

Note
totalSize Hive parameter has a higher precedence over rawDataSize for sizeInBytes table statistic.
Note
readHiveStats is used when HiveClientImpl is requested for the metadata of a table or table partition.

Retrieving Table Partition Metadata (Converting Table Partition Metadata from Hive Format to Spark SQL Format) — fromHivePartition Method

fromHivePartition(hp: HivePartition): CatalogTablePartition

fromHivePartition simply creates a CatalogTablePartition with the following:

Note
fromHivePartition is used when HiveClientImpl is requested for getPartitionOption, getPartitions and getPartitionsByFilter.

Converting Native Table Metadata to Hive’s Table — toHiveTable Method

toHiveTable(table: CatalogTable, userName: Option[String] = None): HiveTable

toHiveTable simply creates a new Hive Table and copies the properties from the input CatalogTable.

Note

toHiveTable is used when:

getSparkSQLDataType Internal Utility

getSparkSQLDataType(hc: FieldSchema): DataType

getSparkSQLDataType…​FIXME

Note
getSparkSQLDataType is used when…​FIXME

Converting CatalogTablePartition to Hive Partition — toHivePartition Utility

toHivePartition(
  p: CatalogTablePartition,
  ht: Table): Partition

toHivePartition creates a Hive org.apache.hadoop.hive.ql.metadata.Partition for the input CatalogTablePartition and the Hive org.apache.hadoop.hive.ql.metadata.Table.

Note

toHivePartition is used when:

Creating New HiveClientImpl — newSession Method

newSession(): HiveClientImpl
Note
newSession is part of the HiveClient contract to…​FIXME.

newSession…​FIXME

getRawTableOption Internal Method

getRawTableOption(
  dbName: String,
  tableName: String): Option[Table]

getRawTableOption requests the Hive metastore client for the Hive’s metadata of the input table.

Note
getRawTableOption is used when HiveClientImpl is requested to tableExists and getTableOption.

results matching ""

    No results matching ""