TreeNode — Node in Catalyst Tree

TreeNode is the contract of nodes in Catalyst tree with name and zero or more children.

package org.apache.spark.sql.catalyst.trees

abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product {
  self: BaseType =>

  // only required properties (vals and methods) that have no implementation
  // the others follow
  def children: Seq[BaseType]
  def verboseString: String
}

TreeNode is a recursive data structure that can have one or many children that are again TreeNodes.

Tip
Read up on <: type operator in Scala in Upper Type Bounds.

Scala-specific, TreeNode is an abstract class that is the base class of Catalyst Expression and QueryPlan abstract classes.

TreeNode therefore allows for building entire trees of TreeNodes, e.g. generic query plans with concrete logical and physical operators that both use Catalyst expressions (which are TreeNodes again).

Note
Spark SQL uses TreeNode for query plans and Catalyst expressions that can further be used together to build more advanced trees, e.g. Catalyst expressions can have query plans as subquery expressions.

TreeNode can itself be a node in a tree or a collection of nodes, i.e. itself and the children nodes. Not only does TreeNode come with the methods that you may have used in Scala Collection API (e.g. map, flatMap, collect, collectFirst, foreach), but also specialized ones for more advanced tree manipulation, e.g. mapChildren, transform, transformDown, transformUp, foreachUp, numberedTreeString, p, asCode, prettyJson.

Table 1. TreeNode API (Public Methods)
Method Description

apply

apply(number: Int): TreeNode[_]

argString

argString: String

asCode

asCode: String

collect

collect[B](pf: PartialFunction[BaseType, B]): Seq[B]

collectFirst

collectFirst[B](pf: PartialFunction[BaseType, B]): Option[B]

collectLeaves

collectLeaves(): Seq[BaseType]

fastEquals

fastEquals(other: TreeNode[_]): Boolean

find

find(f: BaseType => Boolean): Option[BaseType]

flatMap

flatMap[A](f: BaseType => TraversableOnce[A]): Seq[A]

foreach

foreach(f: BaseType => Unit): Unit

foreachUp

foreachUp(f: BaseType => Unit): Unit

generateTreeString

generateTreeString(
  depth: Int,
  lastChildren: Seq[Boolean],
  builder: StringBuilder,
  verbose: Boolean,
  prefix: String = "",
  addSuffix: Boolean = false): StringBuilder

map

map[A](f: BaseType => A): Seq[A]

mapChildren

mapChildren(f: BaseType => BaseType): BaseType

nodeName

nodeName: String

numberedTreeString

numberedTreeString: String

p

p(number: Int): BaseType

prettyJson

prettyJson: String

simpleString

simpleString: String

toJSON

toJSON: String

transform

transform(rule: PartialFunction[BaseType, BaseType]): BaseType

transformDown

transformDown(rule: PartialFunction[BaseType, BaseType]): BaseType

transformUp

transformUp(rule: PartialFunction[BaseType, BaseType]): BaseType

treeString

treeString: String
treeString(verbose: Boolean, addSuffix: Boolean = false): String

verboseString

verboseString: String

verboseStringWithSuffix

verboseStringWithSuffix: String

withNewChildren

withNewChildren(newChildren: Seq[BaseType]): BaseType
Table 2. (Subset of) TreeNode Contract
Method Description

children

Child nodes

verboseString

One-line verbose description

Used when TreeNode is requested for generateTreeString (with verbose flag enabled) and verboseStringWithSuffix

Table 3. TreeNodes
TreeNode Description

Expression

QueryPlan

Tip

TreeNode abstract type is a fairly advanced Scala type definition (at least comparing to the other Scala types in Spark) so understanding its behaviour even outside Spark might be worthwhile by itself.

abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product {
  self: BaseType =>

  // ...
}

withNewChildren Method

withNewChildren(newChildren: Seq[BaseType]): BaseType

withNewChildren…​FIXME

Note
withNewChildren is used when…​FIXME

Simple Node Description — simpleString Method

simpleString: String

simpleString gives a simple one-line description of a TreeNode.

Internally, simpleString is the nodeName followed by argString separated by a single white space.

Note
simpleString is used when TreeNode is requested for argString (of child nodes) and tree text representation (with verbose flag off).

Numbered Text Representation — numberedTreeString Method

numberedTreeString: String

numberedTreeString adds numbers to the text representation of all the nodes.

Note
numberedTreeString is used primarily for interactive debugging using apply and p methods.

Getting n-th TreeNode in Tree (for Interactive Debugging) — apply Method

apply(number: Int): TreeNode[_]

apply gives number-th tree node in a tree.

Note
apply can be used for interactive debugging.

Internally, apply gets the node at number position or null.

Getting n-th BaseType in Tree (for Interactive Debugging) — p Method

p(number: Int): BaseType

p gives number-th tree node in a tree as BaseType for interactive debugging.

Note
p can be used for interactive debugging.
Note

BaseType is the base type of a tree and in Spark SQL can be:

Text Representation — toString Method

toString: String
Note
toString is part of Java’s Object Contract for the string representation of an object, e.g. TreeNode.

toString simply returns the text representation of all nodes in the tree.

Text Representation of All Nodes in Tree — treeString Method

treeString: String  (1)
treeString(verbose: Boolean, addSuffix: Boolean = false): String
  1. Turns verbose flag on

treeString gives the string representation of all the nodes in the TreeNode.

import org.apache.spark.sql.{functions => f}
val q = spark.range(10).withColumn("rand", f.rand())
val executedPlan = q.queryExecution.executedPlan

val output = executedPlan.treeString(verbose = true)

scala> println(output)
*(1) Project [id#0L, rand(6790207094253656854) AS rand#2]
+- *(1) Range (0, 10, step=1, splits=8)
Note

treeString is used when:

Verbose Description with Suffix — verboseStringWithSuffix Method

verboseStringWithSuffix: String

verboseStringWithSuffix simply returns verbose description.

Note
verboseStringWithSuffix is used exclusively when TreeNode is requested to generateTreeString (with verbose and addSuffix flags enabled).

Generating Text Representation of Inner and Regular Child Nodes — generateTreeString Method

generateTreeString(
  depth: Int,
  lastChildren: Seq[Boolean],
  builder: StringBuilder,
  verbose: Boolean,
  prefix: String = "",
  addSuffix: Boolean = false): StringBuilder

Internally, generateTreeString appends the following node descriptions per the verbose and addSuffix flags:

In the end, generateTreeString calls itself recursively for the innerChildren and the child nodes.

Note
generateTreeString is used exclusively when TreeNode is requested for text representation of all nodes in the tree.

Inner Child Nodes — innerChildren Method

innerChildren: Seq[TreeNode[_]]

innerChildren returns the inner nodes that should be shown as an inner nested tree of this node.

innerChildren simply returns an empty collection of TreeNodes.

Note
innerChildren is used when TreeNode is requested to generate the text representation of inner and regular child nodes, allChildren and getNodeNumbered.

allChildren Property

allChildren: Set[TreeNode[_]]
Note
allChildren is a Scala lazy value which is computed once when accessed and cached afterwards.

allChildren…​FIXME

Note
allChildren is used when…​FIXME

getNodeNumbered Internal Method

getNodeNumbered(number: MutableInt): Option[TreeNode[_]]

getNodeNumbered…​FIXME

Note
getNodeNumbered is used when…​FIXME

foreach Method

foreach(f: BaseType => Unit): Unit

foreach applies the input function f to itself (this) first and then (recursively) to the children.

collect Method

collect[B](pf: PartialFunction[BaseType, B]): Seq[B]

collect…​FIXME

collectFirst Method

collectFirst[B](pf: PartialFunction[BaseType, B]): Option[B]

collectFirst…​FIXME

collectLeaves Method

collectLeaves(): Seq[BaseType]

collectLeaves…​FIXME

find Method

find(f: BaseType => Boolean): Option[BaseType]

find…​FIXME

flatMap Method

flatMap[A](f: BaseType => TraversableOnce[A]): Seq[A]

flatMap…​FIXME

foreachUp Method

foreachUp(f: BaseType => Unit): Unit

foreachUp…​FIXME

map Method

map[A](f: BaseType => A): Seq[A]

map…​FIXME

mapChildren Method

mapChildren(f: BaseType => BaseType): BaseType

mapChildren…​FIXME

transform Method

transform(rule: PartialFunction[BaseType, BaseType]): BaseType

transform…​FIXME

Transforming Nodes Downwards — transformDown Method

transformDown(rule: PartialFunction[BaseType, BaseType]): BaseType

transformDown…​FIXME

transformUp Method

transformUp(rule: PartialFunction[BaseType, BaseType]): BaseType

transformUp…​FIXME

asCode Method

asCode: String

asCode…​FIXME

prettyJson Method

prettyJson: String

prettyJson…​FIXME

Note
prettyJson is used when…​FIXME

toJSON Method

toJSON: String

toJSON…​FIXME

Note
toJSON is used when…​FIXME

argString Method

argString: String

argString…​FIXME

Note
argString is used when…​FIXME

nodeName Method

nodeName: String

nodeName returns the name of the class with Exec suffix removed (that is used as a naming convention for the class name of physical operators).

Note
nodeName is used when TreeNode is requested for simpleString and asCode.

fastEquals Method

fastEquals(other: TreeNode[_]): Boolean

fastEquals…​FIXME

Note
fastEquals is used when…​FIXME

shouldConvertToJson Internal Method

shouldConvertToJson(
  product: Product): Boolean

shouldConvertToJson…​FIXME

Note
shouldConvertToJson is used when…​FIXME

results matching ""

    No results matching ""