saveAsHiveFile(
sparkSession: SparkSession,
plan: SparkPlan,
hadoopConf: Configuration,
fileSinkConf: FileSinkDesc,
outputLocation: String,
customPartitionLocations: Map[TablePartitionSpec, String] = Map.empty,
partitionAttributes: Seq[Attribute] = Nil): Set[String]
SaveAsHiveFile Contract — DataWritingCommands That Write Query Result As Hive Files
SaveAsHiveFile is an extension of the DataWritingCommand contract for commands that can saveAsHiveFile (and getExternalTmpPath).
|
Note
|
Read up on |
| SaveAsHiveFile | Description |
|---|---|
saveAsHiveFile Method
saveAsHiveFile sets Hadoop configuration properties when a compressed file output format is used (based on hive.exec.compress.output configuration property).
saveAsHiveFile uses FileCommitProtocol utility to instantiate a committer for the input outputLocation based on the spark.sql.sources.commitProtocolClass configuration property (default: SQLHadoopMapReduceCommitProtocol).
saveAsHiveFile uses FileFormatWriter utility to write the result of executing the input physical query plan (with a HiveFileFormat for the input FileSinkDesc, the new FileCommitProtocol committer, and the input arguments).
|
Note
|
BucketSpec is undefined (None).
|
|
Note
|
saveAsHiveFile is used when InsertIntoHiveDirCommand and InsertIntoHiveTable logical commands are executed.
|
getExternalTmpPath Method
getExternalTmpPath(
sparkSession: SparkSession,
hadoopConf: Configuration,
path: Path): Path
getExternalTmpPath finds the Hive version used. getExternalTmpPath requests the input SparkSession for the ExternalCatalog (that is expected to be a HiveExternalCatalog). getExternalTmpPath requests it for the underlying HiveClient that is in turn requested for the Hive version.
getExternalTmpPath divides (splits) the supported Hive versions into the ones (old versions) that use hive.exec.scratchdir directory (0.12.0 to 1.0.0) and the ones (new versions) that use hive.exec.stagingdir directory (1.1.0 to 2.3.3).
getExternalTmpPath oldVersionExternalTempPath for the old Hive versions and newVersionExternalTempPath for the new Hive versions.
getExternalTmpPath throws an IllegalStateException for unsupported Hive version:
Unsupported hive version: [hiveVersion]
|
Note
|
getExternalTmpPath is used when InsertIntoHiveDirCommand and InsertIntoHiveTable logical commands are executed.
|
deleteExternalTmpPath Method
deleteExternalTmpPath(
hadoopConf: Configuration): Unit
deleteExternalTmpPath…FIXME
|
Note
|
deleteExternalTmpPath is used when…FIXME
|
oldVersionExternalTempPath Internal Method
oldVersionExternalTempPath(
path: Path,
hadoopConf: Configuration,
scratchDir: String): Path
oldVersionExternalTempPath…FIXME
|
Note
|
oldVersionExternalTempPath is used when SaveAsHiveFile is requested to getExternalTmpPath.
|
newVersionExternalTempPath Internal Method
newVersionExternalTempPath(
path: Path,
hadoopConf: Configuration,
stagingDir: String): Path
newVersionExternalTempPath…FIXME
|
Note
|
newVersionExternalTempPath is used when SaveAsHiveFile is requested to getExternalTmpPath.
|
getExtTmpPathRelTo Internal Method
getExtTmpPathRelTo(
path: Path,
hadoopConf: Configuration,
stagingDir: String): Path
getExtTmpPathRelTo…FIXME
|
Note
|
getExtTmpPathRelTo is used when SaveAsHiveFile is requested to newVersionExternalTempPath.
|
getExternalScratchDir Internal Method
getExternalScratchDir(
extURI: URI,
hadoopConf: Configuration,
stagingDir: String): Path
getExternalScratchDir…FIXME
|
Note
|
getExternalScratchDir is used when SaveAsHiveFile is requested to newVersionExternalTempPath.
|
getStagingDir Internal Method
getStagingDir(
inputPath: Path,
hadoopConf: Configuration,
stagingDir: String): Path
getStagingDir…FIXME
|
Note
|
getStagingDir is used when SaveAsHiveFile is requested to getExtTmpPathRelTo and getExternalScratchDir.
|
executionId Internal Method
executionId: String
executionId…FIXME
|
Note
|
executionId is used when…FIXME
|
createdTempDir Internal Registry
createdTempDir: Option[Path] = None
createdTempDir is a Hadoop Path of a staging directory.
createdTempDir is initialized when SaveAsHiveFile is requested to oldVersionExternalTempPath and getStagingDir.
createdTempDir is the hive.exec.stagingdir configuration property.
createdTempDir is deleted when SaveAsHiveFile is requested to deleteExternalTmpPath and at the normal termination of VM (since deleteOnExit is used).