InsertIntoHadoopFsRelationCommand Logical Command

InsertIntoHadoopFsRelationCommand is a logical command that writes the result of executing a query to an output path in the given FileFormat (and other properties).

InsertIntoHadoopFsRelationCommand is created when:

DataSource is requested to plan for writing to a FileFormat-based data source for the following:
- CreateDataSourceTableAsSelectCommand logical command
- InsertIntoDataSourceDirCommand logical command
- DataFrameWriter.save operator with DataSource V1 data sources
DataSourceAnalysis post-hoc logical resolution rule is executed (and resolves a InsertIntoTable logical operator with a HadoopFsRelation)

InsertIntoHadoopFsRelationCommand uses partitionOverwriteMode option that overrides spark.sql.sources.partitionOverwriteMode property for dynamic partition inserts.

InsertIntoHadoopFsRelationCommand takes the following to be created:

Note

staticPartitions may hold zero or more partitions as follows:

Always empty when created when DataSource is requested to planForWritingFileFormat
Possibly with partitions when created when DataSourceAnalysis posthoc logical resolution rule is applied to an InsertIntoTable logical operator over a HadoopFsRelation relation

With that, staticPartitions are simply the partitions of an InsertIntoTable logical operator.

run(
  sparkSession: SparkSession,
  child: SparkPlan): Seq[Row]

Note	`run` is part of DataWritingCommand contract.

run uses the spark.sql.hive.manageFilesourcePartitions configuration property to…FIXME

Caution

FIXME When is the catalogTable defined?

Caution

FIXME When is tracksPartitionsInCatalog of CatalogTable enabled?

run gets the partitionOverwriteMode option…FIXME

run uses FileCommitProtocol utility to instantiate a committer based on the spark.sql.sources.commitProtocolClass (default: SQLHadoopMapReduceCommitProtocol) and the outputPath, the dynamicPartitionOverwrite, and random jobId.

For insertion, run simply uses the FileFormatWriter utility to write and then…FIXME (does some table-specific "tasks").

Otherwise (for non-insertion case), run simply prints out the following INFO message to the logs and finishes.

Skipping insertion into a relation that already exists.

InsertIntoHadoopFsRelationCommand