ClusteringEvaluator — Evaluator of Clustering Models

ClusteringEvaluator is an Evaluator of clustering models (e.g. FPGrowth, GaussianMixture, ALS, KMeans, LinearSVC, RandomForestRegressor, GeneralizedLinearRegression, LinearRegression, GBTRegressor, DecisionTreeRegressor, NaiveBayes)

Note
ClusteringEvaluator is available since Spark 2.3.0.

ClusteringEvaluator finds the best model by maximizing the model evaluation metric (i.e. isLargerBetter is always turned on).

import org.apache.spark.ml.evaluation.ClusteringEvaluator
val cluEval = new ClusteringEvaluator().
  setPredictionCol("prediction").
  setFeaturesCol("features").
  setMetricName("silhouette")

scala> cluEval.isLargerBetter
res0: Boolean = true

scala> println(cluEval.explainParams)
featuresCol: features column name (default: features, current: features)
metricName: metric name in evaluation (silhouette) (default: silhouette, current: silhouette)
predictionCol: prediction column name (default: prediction, current: prediction)
Table 1. ClusteringEvaluator' Parameters
Parameter Default Value Description

featuresCol

features

Name of the column with features (of type VectorUDT)

metricName

silhouette

Name of the classification metric for evaluation

Note
metricName can only be silhouette.

predictionCol

prediction

Name of the column with prediction (of type NumericType)

Evaluating Model Output — evaluate Method

evaluate(dataset: Dataset[_]): Double
Note
evaluate is a part of Evaluator Contract.

evaluate…​FIXME

results matching ""

    No results matching ""