import org.apache.spark.sql.execution.SQLExecution
scala> println(SQLExecution.EXECUTION_ID_KEY)
spark.sql.execution.id
SQLExecution Helper Object
SQLExecution defines spark.sql.execution.id Spark property that is used to track multiple Spark jobs that should all together constitute a single structured query execution (that could be easily reported as a single execution unit).
Actions of a structured query are executed using SQLExecution.withNewExecutionId static method that sets spark.sql.execution.id as Spark Core’s local property and "stitches" different Spark jobs as parts of one structured query action (that you can then see in web UI’s SQL tab).
|
Tip
|
Use SparkListener to listen to SparkListenerSQLExecutionStart events and know the execution ids of structured queries that have been executed in a Spark SQL application.
|
|
Note
|
Jobs without spark.sql.execution.id key are not considered to belong to SQL query executions. |
SQLExecution keeps track of all execution ids and their QueryExecutions in executionIdToQueryExecution internal registry.
|
Tip
|
Use SQLExecution.getQueryExecution to find the QueryExecution for an execution id. |
Executing Dataset Action (with Zero or More Spark Jobs) Under New Execution Id — withNewExecutionId Method
withNewExecutionId[T](
sparkSession: SparkSession,
queryExecution: QueryExecution)(body: => T): T
withNewExecutionId executes body query action with a new execution id (given as the input executionId or auto-generated) so that all Spark jobs that have been scheduled by the query action could be marked as parts of the same Dataset action execution.
withNewExecutionId allows for collecting all the Spark jobs (even executed on separate threads) together under a single SQL query execution for reporting purposes, e.g. to reporting them as one single structured query in web UI.
|
Note
|
If there is another execution id already set, it is replaced for the course of the current action. |
In addition, the QueryExecution variant posts SparkListenerSQLExecutionStart and SparkListenerSQLExecutionEnd events (to LiveListenerBus event bus) before and after executing the body action, respectively. It is used to inform SQLListener when a SQL query execution starts and ends.
|
Note
|
Nested execution ids are not supported in the QueryExecution variant.
|
|
Note
|
|
Finding QueryExecution for Execution ID — getQueryExecution Method
getQueryExecution(executionId: Long): QueryExecution
getQueryExecution simply gives the QueryExecution for the executionId or null if not found.
Executing Action (with Zero or More Spark Jobs) Tracked Under Given Execution Id — withExecutionId Method
withExecutionId[T](
sc: SparkContext,
executionId: String)(body: => T): T
withExecutionId executes the body action as part of executing multiple Spark jobs under executionId execution identifier.
def body = println("Hello World")
scala> SQLExecution.withExecutionId(sc = spark.sparkContext, executionId = "Custom Name")(body)
Hello World
|
Note
|
|
checkSQLExecutionId Method
checkSQLExecutionId(sparkSession: SparkSession): Unit
checkSQLExecutionId…FIXME
|
Note
|
checkSQLExecutionId is used exclusively when FileFormatWriter is requested to write the result of a structured query.
|
withSQLConfPropagated Method
withSQLConfPropagated[T](sparkSession: SparkSession)(body: => T): T
withSQLConfPropagated…FIXME
|
Note
|
|