log4j.logger.org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec=INFO
WriteToDataSourceV2Exec Physical Operator
WriteToDataSourceV2Exec
is a physical operator that represents an AppendData logical operator (and a deprecated WriteToDataSourceV2 logical operator) at execution time.
WriteToDataSourceV2Exec
is created exclusively when DataSourceV2Strategy execution planning strategy is requested to plan an AppendData logical operator (and a deprecated WriteToDataSourceV2).
Note
|
Although WriteToDataSourceV2 logical operator is deprecated since Spark SQL 2.4.0 (for AppendData logical operator), the AppendData logical operator is currently used in tests only. That makes WriteToDataSourceV2 logical operator still relevant.
|
WriteToDataSourceV2Exec
takes the following to be created:
When requested for the child operators, WriteToDataSourceV2Exec
gives the one child physical plan.
When requested for the output attributes, WriteToDataSourceV2Exec
gives no attributes (an empty collection).
Tip
|
Enable Add the following line to Refer to Logging. |
Executing Physical Operator (Generating RDD[InternalRow]) — doExecute
Method
doExecute(): RDD[InternalRow]
Note
|
doExecute is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. RDD[InternalRow] ).
|
doExecute
requests the DataSourceWriter to create a DataWriterFactory for the writing task.
doExecute
requests the DataSourceWriter to use a CommitCoordinator or not.
doExecute
requests the child physical plan to execute (that triggers physical query planning and in the end generates an RDD
of internal binary rows).
doExecute
prints out the following INFO message to the logs:
Start processing data source writer: [writer]. The input RDD has [length] partitions.
doExecute
requests the SparkContext to run a Spark job with the following:
-
The
RDD[InternalRow]
of the child physical plan -
A partition processing function that requests the
DataWritingSparkTask
object to run the writing task (of the DataSourceWriter) with or with no commit coordinator -
A result handler function that records the result
WriterCommitMessage
from a successful data writer and requests the DataSourceWriter to handle the commit message (which does nothing by default)
doExecute
prints out the following INFO message to the logs:
Data source writer [writer] is committing.
doExecute
requests the DataSourceWriter to commit (passing on with the commit messages).
In the end, doExecute
prints out the following INFO message to the logs:
Data source writer [writer] committed.
In case of any error (Throwable
), doExecute
…FIXME