train(dataset: DataFrame): M
train method is supposed to ease dealing with schema validation and copying parameters to a trained
PredictionModel model. It also sets the parent of the model to itself.
Predictor is basically a function that maps a
DataFrame onto a
predictor: DataFrame =[train]=> PredictionModel
It implements the abstract
fit(dataset: DataFrame) of the
Estimator abstract class that validates and transforms the schema of a dataset (using a custom
transformSchema of PipelineStage), and then calls the abstract
Validation and transformation of a schema (using
transformSchema) makes sure that:
featurescolumn exists and is of correct type (defaults to Vector).
labelcolumn exists and is of
As the last step, it adds the
prediction column of