Catalyst is an implementation-agnostic framework to represent and manipulate a dataflow graph, i.e. trees of relational operators and expressions.
|The Catalyst framework were first introduced in SPARK-1251 Support for optimizing and executing structured queries and became part of Apache Spark on 20/Mar/14 19:12.|
Spark 2.0 uses the Catalyst tree manipulation library to build an extensible query plan optimizer with a number of query optimizations.
Catalyst supports both rule-based and cost-based optimization.