Evaluator — ML Pipeline Component for Model Scoring

Evaluator is the contract in Spark MLlib for ML Pipeline components that can evaluate models for given parameters.

ML Pipeline evaluators are transformers that take DataFrames and compute metrics indicating how good a model is.

evaluator: DataFrame =[evaluate]=> Double

Evaluator is used to evaluate models and is usually (if not always) used for best model selection by CrossValidator and TrainValidationSplit.

Evaluator uses isLargerBetter method to indicate whether the Double metric should be maximized (true) or minimized (false). It considers a larger value better (true) by default.

Table 1. Evaluator’s Known Implementations
Name Description

BinaryClassificationEvaluator

Evaluator of binary classification models

ClusteringEvaluator

Evaluator of clustering models

MulticlassClassificationEvaluator

Evaluator of multiclass classification models

RegressionEvaluator

Evaluator of regression models

Evaluating Model Output with Extra Parameters — evaluate Method

evaluate(dataset: Dataset[_], paramMap: ParamMap): Double

evaluate copies the extra paramMap and evaluates a model output.

Note
evaluate is used…​FIXME

Evaluator Contract

package org.apache.spark.ml.evaluation

abstract class Evaluator {
  def evaluate(dataset: Dataset[_]): Double
  def copy(extra: ParamMap): Evaluator
  def isLargerBetter: Boolean = true
}
Table 2. Evaluator Contract
Method Description

copy

Used when…​

evaluate

Used when…​

isLargerBetter

Indicates whether the metric returned by evaluate should be maximized (true) or minimized (false).

Gives true by default.

results matching ""

    No results matching ""