saveAsHiveFile(
sparkSession: SparkSession,
plan: SparkPlan,
hadoopConf: Configuration,
fileSinkConf: FileSinkDesc,
outputLocation: String,
customPartitionLocations: Map[TablePartitionSpec, String] = Map.empty,
partitionAttributes: Seq[Attribute] = Nil): Set[String]
SaveAsHiveFile Contract — DataWritingCommands That Write Query Result As Hive Files
SaveAsHiveFile
is an extension of the DataWritingCommand contract for commands that can saveAsHiveFile (and getExternalTmpPath).
Note
|
Read up on |
SaveAsHiveFile | Description |
---|---|
saveAsHiveFile
Method
saveAsHiveFile
sets Hadoop configuration properties when a compressed file output format is used (based on hive.exec.compress.output configuration property).
saveAsHiveFile
uses FileCommitProtocol
utility to instantiate a committer for the input outputLocation
based on the spark.sql.sources.commitProtocolClass configuration property (default: SQLHadoopMapReduceCommitProtocol).
saveAsHiveFile
uses FileFormatWriter
utility to write the result of executing the input physical query plan (with a HiveFileFormat for the input FileSinkDesc
, the new FileCommitProtocol
committer, and the input arguments).
Note
|
BucketSpec is undefined (None ).
|
Note
|
saveAsHiveFile is used when InsertIntoHiveDirCommand and InsertIntoHiveTable logical commands are executed.
|
getExternalTmpPath
Method
getExternalTmpPath(
sparkSession: SparkSession,
hadoopConf: Configuration,
path: Path): Path
getExternalTmpPath
finds the Hive version used. getExternalTmpPath
requests the input SparkSession for the ExternalCatalog (that is expected to be a HiveExternalCatalog). getExternalTmpPath
requests it for the underlying HiveClient that is in turn requested for the Hive version.
getExternalTmpPath
divides (splits) the supported Hive versions into the ones (old versions) that use hive.exec.scratchdir directory (0.12.0
to 1.0.0
) and the ones (new versions) that use hive.exec.stagingdir directory (1.1.0
to 2.3.3
).
getExternalTmpPath
oldVersionExternalTempPath for the old Hive versions and newVersionExternalTempPath for the new Hive versions.
getExternalTmpPath
throws an IllegalStateException
for unsupported Hive version:
Unsupported hive version: [hiveVersion]
Note
|
getExternalTmpPath is used when InsertIntoHiveDirCommand and InsertIntoHiveTable logical commands are executed.
|
deleteExternalTmpPath
Method
deleteExternalTmpPath(
hadoopConf: Configuration): Unit
deleteExternalTmpPath
…FIXME
Note
|
deleteExternalTmpPath is used when…FIXME
|
oldVersionExternalTempPath
Internal Method
oldVersionExternalTempPath(
path: Path,
hadoopConf: Configuration,
scratchDir: String): Path
oldVersionExternalTempPath
…FIXME
Note
|
oldVersionExternalTempPath is used when SaveAsHiveFile is requested to getExternalTmpPath.
|
newVersionExternalTempPath
Internal Method
newVersionExternalTempPath(
path: Path,
hadoopConf: Configuration,
stagingDir: String): Path
newVersionExternalTempPath
…FIXME
Note
|
newVersionExternalTempPath is used when SaveAsHiveFile is requested to getExternalTmpPath.
|
getExtTmpPathRelTo
Internal Method
getExtTmpPathRelTo(
path: Path,
hadoopConf: Configuration,
stagingDir: String): Path
getExtTmpPathRelTo
…FIXME
Note
|
getExtTmpPathRelTo is used when SaveAsHiveFile is requested to newVersionExternalTempPath.
|
getExternalScratchDir
Internal Method
getExternalScratchDir(
extURI: URI,
hadoopConf: Configuration,
stagingDir: String): Path
getExternalScratchDir
…FIXME
Note
|
getExternalScratchDir is used when SaveAsHiveFile is requested to newVersionExternalTempPath.
|
getStagingDir
Internal Method
getStagingDir(
inputPath: Path,
hadoopConf: Configuration,
stagingDir: String): Path
getStagingDir
…FIXME
Note
|
getStagingDir is used when SaveAsHiveFile is requested to getExtTmpPathRelTo and getExternalScratchDir.
|
executionId
Internal Method
executionId: String
executionId
…FIXME
Note
|
executionId is used when…FIXME
|
createdTempDir
Internal Registry
createdTempDir: Option[Path] = None
createdTempDir
is a Hadoop Path of a staging directory.
createdTempDir
is initialized when SaveAsHiveFile
is requested to oldVersionExternalTempPath and getStagingDir.
createdTempDir
is the hive.exec.stagingdir configuration property.
createdTempDir
is deleted when SaveAsHiveFile
is requested to deleteExternalTmpPath and at the normal termination of VM (since deleteOnExit
is used).