run(
sparkSession: SparkSession,
child: SparkPlan): Seq[Row]
InsertIntoHadoopFsRelationCommand Logical Command
InsertIntoHadoopFsRelationCommand is a logical command that writes the result of executing a query to an output path in the given FileFormat (and other properties).
InsertIntoHadoopFsRelationCommand is created when:
-
DataSourceis requested to plan for writing to a FileFormat-based data source for the following:-
CreateDataSourceTableAsSelectCommand logical command
-
InsertIntoDataSourceDirCommand logical command
-
DataFrameWriter.save operator with DataSource V1 data sources
-
-
DataSourceAnalysis post-hoc logical resolution rule is executed (and resolves a InsertIntoTable logical operator with a HadoopFsRelation)
InsertIntoHadoopFsRelationCommand uses partitionOverwriteMode option that overrides spark.sql.sources.partitionOverwriteMode property for dynamic partition inserts.
Creating InsertIntoHadoopFsRelationCommand Instance
InsertIntoHadoopFsRelationCommand takes the following to be created:
-
Output Hadoop’s Path
|
Note
|
staticPartitions may hold zero or more partitions as follows:
With that, staticPartitions are simply the partitions of an InsertIntoTable logical operator. |
Executing Data-Writing Logical Command — run Method
|
Note
|
run is part of DataWritingCommand contract.
|
run uses the spark.sql.hive.manageFilesourcePartitions configuration property to…FIXME
|
Caution
|
FIXME When is the catalogTable defined? |
|
Caution
|
FIXME When is tracksPartitionsInCatalog of CatalogTable enabled?
|
run gets the partitionOverwriteMode option…FIXME
run uses FileCommitProtocol utility to instantiate a committer based on the spark.sql.sources.commitProtocolClass (default: SQLHadoopMapReduceCommitProtocol) and the outputPath, the dynamicPartitionOverwrite, and random jobId.
For insertion, run simply uses the FileFormatWriter utility to write and then…FIXME (does some table-specific "tasks").
Otherwise (for non-insertion case), run simply prints out the following INFO message to the logs and finishes.
Skipping insertion into a relation that already exists.
run uses SchemaUtils to make sure that there are no duplicates in the outputColumnNames.