SaveAsHiveFile Contract — DataWritingCommands That Write Query Result As Hive Files

SaveAsHiveFile is an extension of the DataWritingCommand contract for commands that can saveAsHiveFile (and getExternalTmpPath).

Note

SaveAsHiveFile supports viewfs:// URI scheme for new Hive versions.

Read up on ViewFs in the Hadoop official documentation.

Table 1. SaveAsHiveFiles
SaveAsHiveFile Description

InsertIntoHiveDirCommand

InsertIntoHiveTable

saveAsHiveFile Method

saveAsHiveFile(
  sparkSession: SparkSession,
  plan: SparkPlan,
  hadoopConf: Configuration,
  fileSinkConf: FileSinkDesc,
  outputLocation: String,
  customPartitionLocations: Map[TablePartitionSpec, String] = Map.empty,
  partitionAttributes: Seq[Attribute] = Nil): Set[String]

saveAsHiveFile sets Hadoop configuration properties when a compressed file output format is used (based on hive.exec.compress.output configuration property).

saveAsHiveFile uses FileCommitProtocol utility to instantiate a committer for the input outputLocation based on the spark.sql.sources.commitProtocolClass configuration property (default: SQLHadoopMapReduceCommitProtocol).

saveAsHiveFile uses FileFormatWriter utility to write the result of executing the input physical query plan (with a HiveFileFormat for the input FileSinkDesc, the new FileCommitProtocol committer, and the input arguments).

Note
BucketSpec is undefined (None).
Note
saveAsHiveFile is used when InsertIntoHiveDirCommand and InsertIntoHiveTable logical commands are executed.

getExternalTmpPath Method

getExternalTmpPath(
  sparkSession: SparkSession,
  hadoopConf: Configuration,
  path: Path): Path

getExternalTmpPath finds the Hive version used. getExternalTmpPath requests the input SparkSession for the ExternalCatalog (that is expected to be a HiveExternalCatalog). getExternalTmpPath requests it for the underlying HiveClient that is in turn requested for the Hive version.

getExternalTmpPath divides (splits) the supported Hive versions into the ones (old versions) that use hive.exec.scratchdir directory (0.12.0 to 1.0.0) and the ones (new versions) that use hive.exec.stagingdir directory (1.1.0 to 2.3.3).

getExternalTmpPath oldVersionExternalTempPath for the old Hive versions and newVersionExternalTempPath for the new Hive versions.

getExternalTmpPath throws an IllegalStateException for unsupported Hive version:

Unsupported hive version: [hiveVersion]
Note
getExternalTmpPath is used when InsertIntoHiveDirCommand and InsertIntoHiveTable logical commands are executed.

deleteExternalTmpPath Method

deleteExternalTmpPath(
  hadoopConf: Configuration): Unit

deleteExternalTmpPath…​FIXME

Note
deleteExternalTmpPath is used when…​FIXME

oldVersionExternalTempPath Internal Method

oldVersionExternalTempPath(
  path: Path,
  hadoopConf: Configuration,
  scratchDir: String): Path

oldVersionExternalTempPath…​FIXME

Note
oldVersionExternalTempPath is used when SaveAsHiveFile is requested to getExternalTmpPath.

newVersionExternalTempPath Internal Method

newVersionExternalTempPath(
  path: Path,
  hadoopConf: Configuration,
  stagingDir: String): Path

newVersionExternalTempPath…​FIXME

Note
newVersionExternalTempPath is used when SaveAsHiveFile is requested to getExternalTmpPath.

getExtTmpPathRelTo Internal Method

getExtTmpPathRelTo(
  path: Path,
  hadoopConf: Configuration,
  stagingDir: String): Path

getExtTmpPathRelTo…​FIXME

Note
getExtTmpPathRelTo is used when SaveAsHiveFile is requested to newVersionExternalTempPath.

getExternalScratchDir Internal Method

getExternalScratchDir(
  extURI: URI,
  hadoopConf: Configuration,
  stagingDir: String): Path

getExternalScratchDir…​FIXME

Note
getExternalScratchDir is used when SaveAsHiveFile is requested to newVersionExternalTempPath.

getStagingDir Internal Method

getStagingDir(
  inputPath: Path,
  hadoopConf: Configuration,
  stagingDir: String): Path

getStagingDir…​FIXME

Note
getStagingDir is used when SaveAsHiveFile is requested to getExtTmpPathRelTo and getExternalScratchDir.

executionId Internal Method

executionId: String

executionId…​FIXME

Note
executionId is used when…​FIXME

createdTempDir Internal Registry

createdTempDir: Option[Path] = None

createdTempDir is a Hadoop Path of a staging directory.

createdTempDir is initialized when SaveAsHiveFile is requested to oldVersionExternalTempPath and getStagingDir.

createdTempDir is the hive.exec.stagingdir configuration property.

createdTempDir is deleted when SaveAsHiveFile is requested to deleteExternalTmpPath and at the normal termination of VM (since deleteOnExit is used).

results matching ""

    No results matching ""