WriteToDataSourceV2Exec Physical Operator

WriteToDataSourceV2Exec is a physical operator that represents an AppendData logical operator (and a deprecated WriteToDataSourceV2 logical operator) at execution time.

WriteToDataSourceV2Exec is created exclusively when DataSourceV2Strategy execution planning strategy is requested to plan an AppendData logical operator (and a deprecated WriteToDataSourceV2).

Note
Although WriteToDataSourceV2 logical operator is deprecated since Spark SQL 2.4.0 (for AppendData logical operator), the AppendData logical operator is currently used in tests only. That makes WriteToDataSourceV2 logical operator still relevant.

WriteToDataSourceV2Exec takes the following to be created:

When requested for the child operators, WriteToDataSourceV2Exec gives the one child physical plan.

When requested for the output attributes, WriteToDataSourceV2Exec gives no attributes (an empty collection).

Tip

Enable INFO logging level for org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec=INFO

Refer to Logging.

Executing Physical Operator (Generating RDD[InternalRow]) — doExecute Method

doExecute(): RDD[InternalRow]
Note
doExecute is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. RDD[InternalRow]).

doExecute requests the DataSourceWriter to create a DataWriterFactory for the writing task.

doExecute requests the child physical plan to execute (that triggers physical query planning and in the end generates an RDD of internal binary rows).

doExecute prints out the following INFO message to the logs:

Start processing data source writer: [writer]. The input RDD has [length] partitions.

doExecute requests the SparkContext to run a Spark job with the following:

  • The RDD[InternalRow] of the child physical plan

  • A partition processing function that requests the DataWritingSparkTask object to run the writing task (of the DataSourceWriter) with or with no commit coordinator

  • A result handler function that records the result WriterCommitMessage from a successful data writer and requests the DataSourceWriter to handle the commit message (which does nothing by default)

doExecute prints out the following INFO message to the logs:

Data source writer [writer] is committing.

doExecute requests the DataSourceWriter to commit (passing on with the commit messages).

In the end, doExecute prints out the following INFO message to the logs:

Data source writer [writer] committed.

In case of any error (Throwable), doExecute…​FIXME

results matching ""

    No results matching ""