AggUtils Helper Object

AggUtils is a Scala object that defines the methods used exclusively when Aggregation execution planning strategy is executed.

planAggregateWithOneDistinct Method

planAggregateWithOneDistinct(
  groupingExpressions: Seq[NamedExpression],
  functionsWithDistinct: Seq[AggregateExpression],
  functionsWithoutDistinct: Seq[AggregateExpression],
  resultExpressions: Seq[NamedExpression],
  child: SparkPlan): Seq[SparkPlan]

planAggregateWithOneDistinct…​FIXME

Note
planAggregateWithOneDistinct is used exclusively when Aggregation execution planning strategy is executed.

Creating Physical Plan with Two Aggregate Physical Operators for Partial and Final Aggregations — planAggregateWithoutDistinct Method

planAggregateWithoutDistinct(
  groupingExpressions: Seq[NamedExpression],
  aggregateExpressions: Seq[AggregateExpression],
  resultExpressions: Seq[NamedExpression],
  child: SparkPlan): Seq[SparkPlan]

planAggregateWithoutDistinct is a two-step physical operator generator.

planAggregateWithoutDistinct first creates an aggregate physical operator with aggregateExpressions in Partial mode (for partial aggregations).

Note
requiredChildDistributionExpressions for the aggregate physical operator for partial aggregation "stage" is empty.

In the end, planAggregateWithoutDistinct creates another aggregate physical operator (of the same type as before), but aggregateExpressions are now in Final mode (for final aggregations). The aggregate physical operator becomes the parent of the first aggregate operator.

Note
requiredChildDistributionExpressions for the parent aggregate physical operator for final aggregation "stage" are the attributes of groupingExpressions.
Note
planAggregateWithoutDistinct is used exclusively when Aggregation execution planning strategy is executed (with no AggregateExpressions being distinct).

Creating Aggregate Physical Operator — createAggregate Internal Method

createAggregate(
  requiredChildDistributionExpressions: Option[Seq[Expression]] = None,
  groupingExpressions: Seq[NamedExpression] = Nil,
  aggregateExpressions: Seq[AggregateExpression] = Nil,
  aggregateAttributes: Seq[Attribute] = Nil,
  initialInputBufferOffset: Int = 0,
  resultExpressions: Seq[NamedExpression] = Nil,
  child: SparkPlan): SparkPlan

createAggregate creates a physical operator given the input aggregateExpressions aggregate expressions.

Table 1. createAggregate’s Aggregate Physical Operator Selection Criteria (in execution order)
Aggregate Physical Operator Selection Criteria

HashAggregateExec

HashAggregateExec supports all aggBufferAttributes of the input aggregateExpressions aggregate expressions.

ObjectHashAggregateExec

  1. spark.sql.execution.useObjectHashAggregateExec internal flag enabled (it is by default)

  2. ObjectHashAggregateExec supports the input aggregateExpressions aggregate expressions.

SortAggregateExec

When all the above requirements could not be met.

Note
createAggregate is used when AggUtils is requested to planAggregateWithoutDistinct, planAggregateWithOneDistinct (and planStreamingAggregation for Spark Structured Streaming)

results matching ""

    No results matching ""