Debugging Query Execution

debug package object contains tools for debugging query execution, i.e. a full analysis of structured queries (as Datasets).

Table 1. Debugging Query Execution Tools (debug Methods)
Method Description

debug

Debugging a structured query

debug(): Unit

debugCodegen

Displays the Java source code generated for a structured query in whole-stage code generation (i.e. the output of each WholeStageCodegen subtree in a query plan).

debugCodegen(): Unit

debug package object is in org.apache.spark.sql.execution.debug package that you have to import before you can use the debug and debugCodegen methods.

// Import the package object
import org.apache.spark.sql.execution.debug._

// Every Dataset (incl. DataFrame) has now the debug and debugCodegen methods
val q: DataFrame = ...
q.debug
q.debugCodegen
Tip
Read up on Package Objects in the Scala programming language.

Internally, debug package object uses DebugQuery implicit class that "extends" Dataset[_] Scala type with the debug methods.

implicit class DebugQuery(query: Dataset[_]) {
  def debug(): Unit = ...
  def debugCodegen(): Unit = ...
}
Tip
Read up on Implicit Classes in the official documentation of the Scala programming language.

Debugging Dataset — debug Method

debug(): Unit

debug requests the QueryExecution (of the structured query) for the optimized physical query plan.

debug transforms the optimized physical query plan to add a new DebugExec physical operator for every physical operator.

debug requests the query plan to execute and then counts the number of rows in the result. It prints out the following message:

Results returned: [count]

In the end, debug requests every DebugExec physical operator (in the query plan) to dumpStats.

val q = spark.range(10).where('id === 4)

scala> :type q
org.apache.spark.sql.Dataset[Long]

// Extend Dataset[Long] with debug and debugCodegen methods
import org.apache.spark.sql.execution.debug._

scala> q.debug
Results returned: 1
== WholeStageCodegen ==
Tuples output: 1
 id LongType: {java.lang.Long}
== Filter (id#0L = 4) ==
Tuples output: 0
 id LongType: {}
== Range (0, 10, step=1, splits=8) ==
Tuples output: 0
 id LongType: {}

Displaying Java Source Code Generated for Structured Query in Whole-Stage Code Generation ("Debugging" Codegen) — debugCodegen Method

debugCodegen(): Unit

debugCodegen requests the QueryExecution (of the structured query) for the optimized physical query plan.

In the end, debugCodegen simply codegenString the query plan and prints it out to the standard output.

import org.apache.spark.sql.execution.debug._

scala> spark.range(10).where('id === 4).debugCodegen
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*Filter (id#29L = 4)
+- *Range (0, 10, splits=8)

Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */   private Object[] references;
...
Note

debugCodegen is equivalent to using debug interface of the QueryExecution.

val q = spark.range(1, 1000).select('id+1+2+3, 'id+4+5+6)
scala> q.queryExecution.debug.codegen
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*Project [(id#3L + 6) AS (((id + 1) + 2) + 3)#6L, (id#3L + 15) AS (((id + 4) + 5) + 6)#7L]
+- *Range (1, 1000, step=1, splits=8)

Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
...

codegenToSeq Method

codegenToSeq(): Seq[(String, String)]

codegenToSeq…​FIXME

Note
codegenToSeq is used when…​FIXME

codegenString Method

codegenString(plan: SparkPlan): String

codegenString…​FIXME

Note
codegenString is used when…​FIXME

results matching ""

    No results matching ""