Alternating Least Squares (ALS) Matrix Factorization for Recommender Systems

Alternating Least Squares (ALS) Matrix Factorization is a recommendation algorithm…​FIXME

Tip
Read the original paper Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights by Robert M. Bell and Yehuda Koren.

Recommender systems based on collaborative filtering predict user preferences for products or services by learning past user-item relationships. A predominant approach to collaborative filtering is neighborhood based ("k-nearest neighbors"), where a user-item preference rating is interpolated from ratings of similar items and/or users.

Our method is very fast in practice, generating a prediction in about 0.2 milliseconds. Importantly, it does not require training many parameters or a lengthy preprocessing, making it very practical for large scale applications. Finally, we show how to apply these methods to the perceivably much slower user-oriented approach. To this end, we suggest a novel scheme for low dimensional embedding of the users. We evaluate these methods on the Netflix dataset, where they deliver significantly better results than the commercial Netflix Cinematch recommender system.

Tip
Read the follow-up paper Collaborative Filtering for Implicit Feedback Datasets by Yifan Hu, Yehuda Koren and Chris Volinsky.
ALS Example
// Based on JavaALSExample from the official Spark examples
// https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/ml/JavaALSExample.java

// 1. Save the code to als.scala
// 2. Run `spark-shell -i als.scala`

import spark.implicits._

import org.apache.spark.ml.recommendation.ALS
val als = new ALS().
  setMaxIter(5).
  setRegParam(0.01).
  setUserCol("userId").
  setItemCol("movieId").
  setRatingCol("rating")

import org.apache.spark.ml.recommendation.ALS.Rating
// FIXME Use a much richer dataset, i.e. Spark's data/mllib/als/sample_movielens_ratings.txt
// FIXME Load it using spark.read
val ratings = Seq(
  Rating(0, 2, 3),
  Rating(0, 3, 1),
  Rating(0, 5, 2),
  Rating(1, 2, 2)).toDF("userId", "movieId", "rating")
val Array(training, testing) = ratings.randomSplit(Array(0.8, 0.2))

// Make sure that the RDDs have at least one record
assert(training.count > 0)
assert(testing.count > 0)

import org.apache.spark.ml.recommendation.ALSModel
val model = als.fit(training)

// drop NaNs
model.setColdStartStrategy("drop")
val predictions = model.transform(testing)

import org.apache.spark.ml.evaluation.RegressionEvaluator
val evaluator = new RegressionEvaluator().
  setMetricName("rmse").  // root mean squared error
  setLabelCol("rating").
  setPredictionCol("prediction")
val rmse = evaluator.evaluate(predictions)
println(s"Root-mean-square error = $rmse")

// Model is ready for recommendations

// Generate top 10 movie recommendations for each user
val userRecs = model.recommendForAllUsers(10)
userRecs.show(truncate = false)

// Generate top 10 user recommendations for each movie
val movieRecs = model.recommendForAllItems(10)
movieRecs.show(truncate = false)

// Generate top 10 movie recommendations for a specified set of users
// Use a trick to make sure we work with the known users from the input
val users = ratings.select(als.getUserCol).distinct.limit(3)
val userSubsetRecs = model.recommendForUserSubset(users, 10)
userSubsetRecs.show(truncate = false)

// Generate top 10 user recommendations for a specified set of movies
val movies = ratings.select(als.getItemCol).distinct.limit(3)
val movieSubSetRecs = model.recommendForItemSubset(movies, 10)
movieSubSetRecs.show(truncate = false)

System.exit(0)

results matching ""

    No results matching ""