SQLExecution Helper Object

SQLExecution defines spark.sql.execution.id key that is used to track multiple jobs that constitute a single SQL query execution. Whenever a SQL query is to be executed, withNewExecutionId static method is used that sets the key.

Jobs without spark.sql.execution.id key are not considered to belong to SQL query executions.

spark.sql.execution.id EXECUTION_ID_KEY Key

val EXECUTION_ID_KEY = "spark.sql.execution.id"

Tracking Multi-Job SQL Query Executions (withNewExecutionId methods)

  sc: SparkContext,
  executionId: String)(body: => T): T  (1)

  sparkSession: SparkSession,
  queryExecution: QueryExecution)(body: => T): T  (2)
  1. With explicit execution identifier

  2. QueryExecution variant with an auto-generated execution identifier

SQLExecution.withNewExecutionId allow executing the input body query action with the execution id local property set (as executionId or auto-generated). The execution identifier is set as spark.sql.execution.id local property (using SparkContext.setLocalProperty).

The use case is to track Spark jobs (e.g. when running in separate threads) that belong to a single SQL query execution.

It is used in Dataset.withNewExecutionId.
FIXME Where is the proxy-like method used? How important is it?

If there is another execution local property set (as spark.sql.execution.id), it is replaced for the course of the current action.

In addition, the QueryExecution variant posts SparkListenerSQLExecutionStart and SparkListenerSQLExecutionEnd events (to LiveListenerBus event bus) before and after executing the body action, respectively. It is used to inform SQLListener when a SQL query execution starts and ends.

Nested execution ids are not supported in the QueryExecution variant.

results matching ""

    No results matching ""