HiveClientImpl

HiveClientImpl is a HiveClient that uses a Hive metastore client (for meta data/DDL operations using calls to a Hive metastore).

HiveClientImpl is created exclusively when IsolatedClientLoader is requested to create a new Hive client. When created, HiveClientImpl is given the location of the default database for the Hive metastore warehouse (i.e. warehouseDir that is the value of hive.metastore.warehouse.dir Hive-specific Hadoop configuration property).

Note	The location of the default database for the Hive metastore warehouse is `/user/hive/warehouse` by default.

Note	The Hadoop configuration is what HiveExternalCatalog was given when created (which is the default Hadoop configuration from Spark Core’s `SparkContext.hadoopConfiguration` with the Spark properties with `spark.hadoop` prefix).

Tip	Enable `ALL` logging level for `org.apache.spark.sql.hive.client.HiveClientImpl` logger to see what happens inside. Add the following line to `conf/log4j.properties`: `log4j.logger.org.apache.spark.sql.hive.client.HiveClientImpl=ALL` Refer to Logging.

Creating HiveClientImpl Instance

HiveClientImpl takes the following to be created:

HiveVersion
Location of the default database for the Hive metastore warehouse if defined (aka warehouseDir)
SparkConf
Hadoop configuration
Extra configuration
Initial ClassLoader
IsolatedClientLoader

HiveClientImpl initializes the internal properties.

Hive Metastore Client — `client` Internal Method

client: Hive

client is a Hive metastore client (for meta data/DDL operations using calls to the metastore).

Retrieving Table Metadata From Hive Metastore — `getTableOption` Method

getTableOption(
  dbName: String,
  tableName: String): Option[CatalogTable]

Note	`getTableOption` is part of HiveClient contract.

getTableOption prints out the following DEBUG message to the logs:

Looking up [dbName].[tableName]

getTableOption getRawTableOption and converts the Hive table metadata to Spark’s CatalogTable

`renamePartitions` Method

renamePartitions(
  db: String,
  table: String,
  specs: Seq[TablePartitionSpec],
  newSpecs: Seq[TablePartitionSpec]): Unit

Note	`renamePartitions` is part of HiveClient Contract to…FIXME.

renamePartitions…FIXME

`alterPartitions` Method

alterPartitions(
  db: String,
  table: String,
  newParts: Seq[CatalogTablePartition]): Unit

Note	`alterPartitions` is part of HiveClient Contract to…FIXME.

alterPartitions…FIXME

`getPartitions` Method

getPartitions(
  table: CatalogTable,
  spec: Option[TablePartitionSpec]): Seq[CatalogTablePartition]

Note	`getPartitions` is part of HiveClient Contract to…FIXME.

getPartitions…FIXME

`getPartitionsByFilter` Method

getPartitionsByFilter(
  table: CatalogTable,
  predicates: Seq[Expression]): Seq[CatalogTablePartition]

Note	`getPartitionsByFilter` is part of HiveClient Contract to…FIXME.

getPartitionsByFilter…FIXME

`getPartitionOption` Method

getPartitionOption(
  table: CatalogTable,
  spec: TablePartitionSpec): Option[CatalogTablePartition]

Note	`getPartitionOption` is part of HiveClient Contract to…FIXME.

getPartitionOption…FIXME

Creating Table Statistics from Hive’s Table or Partition Parameters — `readHiveStats` Internal Method

readHiveStats(properties: Map[String, String]): Option[CatalogStatistics]

readHiveStats creates a CatalogStatistics from the input Hive table or partition parameters (if available and greater than 0).

Table 1. Table Statistics and Hive Parameters
Hive Parameter	Table Statistics
`totalSize`	sizeInBytes
`rawDataSize`	sizeInBytes
`numRows`	rowCount

Note	`totalSize` Hive parameter has a higher precedence over `rawDataSize` for sizeInBytes table statistic.

Note	`readHiveStats` is used when `HiveClientImpl` is requested for the metadata of a table or table partition.

Retrieving Table Partition Metadata (Converting Table Partition Metadata from Hive Format to Spark SQL Format) — `fromHivePartition` Method

fromHivePartition(hp: HivePartition): CatalogTablePartition

fromHivePartition simply creates a CatalogTablePartition with the following:

spec from Hive’s Partition.getSpec if available
storage from Hive’s StorageDescriptor of the table partition
parameters from Hive’s Partition.getParameters if available
stats from Hive’s Partition.getParameters if available and converted to table statistics format

Note	`fromHivePartition` is used when `HiveClientImpl` is requested for getPartitionOption, getPartitions and getPartitionsByFilter.

Converting Native Table Metadata to Hive’s Table — `toHiveTable` Method

toHiveTable(table: CatalogTable, userName: Option[String] = None): HiveTable

toHiveTable simply creates a new Hive Table and copies the properties from the input CatalogTable.

Note

toHiveTable is used when:

HiveUtils is requested to inferSchema
HiveClientImpl is requested to createTable, alterTable, renamePartitions, alterPartitions, getPartitionOption, getPartitions and getPartitionsByFilter
HiveTableScanExec physical operator is requested for the hiveQlTable
InsertIntoHiveDirCommand and InsertIntoHiveTable logical commands are executed

`getSparkSQLDataType` Internal Utility

getSparkSQLDataType(hc: FieldSchema): DataType

getSparkSQLDataType…FIXME

Note	`getSparkSQLDataType` is used when…FIXME

Converting CatalogTablePartition to Hive Partition — `toHivePartition` Utility

toHivePartition(
  p: CatalogTablePartition,
  ht: Table): Partition

toHivePartition creates a Hive org.apache.hadoop.hive.ql.metadata.Partition for the input CatalogTablePartition and the Hive org.apache.hadoop.hive.ql.metadata.Table.

Note	`toHivePartition` is used when: `HiveClientImpl` is requested to renamePartitions or alterPartitions `HiveTableScanExec` physical operator is requested for the raw Hive partitions

Creating New HiveClientImpl — `newSession` Method

newSession(): HiveClientImpl

Note	`newSession` is part of the HiveClient contract to…FIXME.

newSession…FIXME

`getRawTableOption` Internal Method

getRawTableOption(
  dbName: String,
  tableName: String): Option[Table]

getRawTableOption requests the Hive metastore client for the Hive’s metadata of the input table.

Note	`getRawTableOption` is used when `HiveClientImpl` is requested to tableExists and getTableOption.

HiveClientImpl

HiveClientImpl

Creating HiveClientImpl Instance

Hive Metastore Client — `client` Internal Method

Retrieving Table Metadata From Hive Metastore — `getTableOption` Method

`renamePartitions` Method

`alterPartitions` Method

`getPartitions` Method

`getPartitionsByFilter` Method

`getPartitionOption` Method

Creating Table Statistics from Hive’s Table or Partition Parameters — `readHiveStats` Internal Method

Retrieving Table Partition Metadata (Converting Table Partition Metadata from Hive Format to Spark SQL Format) — `fromHivePartition` Method

Converting Native Table Metadata to Hive’s Table — `toHiveTable` Method

`getSparkSQLDataType` Internal Utility

Converting CatalogTablePartition to Hive Partition — `toHivePartition` Utility

Creating New HiveClientImpl — `newSession` Method

`getRawTableOption` Internal Method

results matching ""

No results matching ""

HiveClientImpl

Creating HiveClientImpl Instance

Hive Metastore Client — client Internal Method

Retrieving Table Metadata From Hive Metastore — getTableOption Method

renamePartitions Method

alterPartitions Method

getPartitions Method

getPartitionsByFilter Method

getPartitionOption Method

Creating Table Statistics from Hive’s Table or Partition Parameters — readHiveStats Internal Method

Retrieving Table Partition Metadata (Converting Table Partition Metadata from Hive Format to Spark SQL Format) — fromHivePartition Method

Converting Native Table Metadata to Hive’s Table — toHiveTable Method

getSparkSQLDataType Internal Utility

Converting CatalogTablePartition to Hive Partition — toHivePartition Utility

Creating New HiveClientImpl — newSession Method

getRawTableOption Internal Method

results matching ""

No results matching ""

Hive Metastore Client — `client` Internal Method

Retrieving Table Metadata From Hive Metastore — `getTableOption` Method

`renamePartitions` Method

`alterPartitions` Method

`getPartitions` Method

`getPartitionsByFilter` Method

`getPartitionOption` Method

Creating Table Statistics from Hive’s Table or Partition Parameters — `readHiveStats` Internal Method

Retrieving Table Partition Metadata (Converting Table Partition Metadata from Hive Format to Spark SQL Format) — `fromHivePartition` Method

Converting Native Table Metadata to Hive’s Table — `toHiveTable` Method

`getSparkSQLDataType` Internal Utility

Converting CatalogTablePartition to Hive Partition — `toHivePartition` Utility

Creating New HiveClientImpl — `newSession` Method

`getRawTableOption` Internal Method