run(
sparkSession: SparkSession,
child: SparkPlan): Seq[Row]
InsertIntoHadoopFsRelationCommand Logical Command
InsertIntoHadoopFsRelationCommand
is a logical command that writes the result of executing a query to an output path in the given FileFormat (and other properties).
InsertIntoHadoopFsRelationCommand
is created when:
-
DataSource
is requested to plan for writing to a FileFormat-based data source for the following:-
CreateDataSourceTableAsSelectCommand logical command
-
InsertIntoDataSourceDirCommand logical command
-
DataFrameWriter.save operator with DataSource V1 data sources
-
-
DataSourceAnalysis post-hoc logical resolution rule is executed (and resolves a InsertIntoTable logical operator with a HadoopFsRelation)
InsertIntoHadoopFsRelationCommand
uses partitionOverwriteMode option that overrides spark.sql.sources.partitionOverwriteMode property for dynamic partition inserts.
Creating InsertIntoHadoopFsRelationCommand Instance
InsertIntoHadoopFsRelationCommand
takes the following to be created:
-
Output Hadoop’s Path
Note
|
staticPartitions may hold zero or more partitions as follows:
With that, staticPartitions are simply the partitions of an InsertIntoTable logical operator. |
Executing Data-Writing Logical Command — run
Method
Note
|
run is part of DataWritingCommand contract.
|
run
uses the spark.sql.hive.manageFilesourcePartitions configuration property to…FIXME
Caution
|
FIXME When is the catalogTable defined? |
Caution
|
FIXME When is tracksPartitionsInCatalog of CatalogTable enabled?
|
run
gets the partitionOverwriteMode option…FIXME
run
uses FileCommitProtocol
utility to instantiate a committer based on the spark.sql.sources.commitProtocolClass (default: SQLHadoopMapReduceCommitProtocol) and the outputPath, the dynamicPartitionOverwrite, and random jobId
.
For insertion, run
simply uses the FileFormatWriter
utility to write
and then…FIXME (does some table-specific "tasks").
Otherwise (for non-insertion case), run
simply prints out the following INFO message to the logs and finishes.
Skipping insertion into a relation that already exists.
run
uses SchemaUtils
to make sure that there are no duplicates in the outputColumnNames.