InsertIntoHadoopFsRelationCommand Logical Command

InsertIntoHadoopFsRelationCommand is a logical command that writes the result of executing a query to an output path in the given FileFormat (and other properties).

InsertIntoHadoopFsRelationCommand is created when:

InsertIntoHadoopFsRelationCommand uses partitionOverwriteMode option that overrides spark.sql.sources.partitionOverwriteMode property for dynamic partition inserts.

Creating InsertIntoHadoopFsRelationCommand Instance

InsertIntoHadoopFsRelationCommand takes the following to be created:

  • Output Hadoop’s Path

  • Static table partitions (Map[String, String])

  • ifPartitionNotExists flag

  • Partition columns (Seq[Attribute])

  • BucketSpec

  • FileFormat

  • Options (Map[String, String])

  • Logical plan

  • SaveMode

  • CatalogTable

  • FileIndex

  • Output column names


staticPartitions may hold zero or more partitions as follows:

With that, staticPartitions are simply the partitions of an InsertIntoTable logical operator.

Executing Data-Writing Logical Command — run Method

  sparkSession: SparkSession,
  child: SparkPlan): Seq[Row]
run is part of DataWritingCommand contract.

run uses the spark.sql.hive.manageFilesourcePartitions configuration property to…​FIXME

FIXME When is the catalogTable defined?
FIXME When is tracksPartitionsInCatalog of CatalogTable enabled?

run gets the partitionOverwriteMode option…​FIXME

run uses FileCommitProtocol utility to instantiate a committer based on the spark.sql.sources.commitProtocolClass (default: SQLHadoopMapReduceCommitProtocol) and the outputPath, the dynamicPartitionOverwrite, and random jobId.

For insertion, run simply uses the FileFormatWriter utility to write and then…​FIXME (does some table-specific "tasks").

Otherwise (for non-insertion case), run simply prints out the following INFO message to the logs and finishes.

Skipping insertion into a relation that already exists.

results matching ""

    No results matching ""