RecSys23-Demo

In the demo, we showcase the easy process of selecting the best-suited model in different scenarios with LensKit-Auto. A LensKit-Auto developer simply has to call a single function call to select, tune and ensemble LensKit algorithms.

The demo is devided into two parts:

  1. In the first part we are going to select and tune a Top-N recommender out of all of LensKit’s algorithms on the Movielens 100k dataset.

  2. In the second part we are going to tune and ensemble the BiasedMatrixFactorization predictor for the Movielens 100k dataset.

Loading Data

First, we store the Movielens 100k dataset as a pandas dataframe.

[2]:
from lenskit.datasets import ML100K
# read in the Movielens 100k dataset as a pandas dataframe
ml100k = ML100K('../data/ml-100k')
ratings = ml100k.ratings
ratings.head()
[2]:
user item rating timestamp
0 196 242 3.0 881250949
1 186 302 3.0 891717742
2 22 377 1.0 878887116
3 244 51 2.0 880606923
4 166 346 1.0 886397596

Splitting Data

For this demo we use a row-based holdout split. 25% of the dataset rows are contained by the test set and 75% of the rows are contained by the train set. A holdout split is not ideal and we would rather use a cross-fold split in an experiment. But for the sake of this demo, a holdout split keeps the code simple and we do not have to calculate the mean error over all folds.

[3]:
# perform holdout validation split
test = ratings.sample(frac=0.25, random_state=42)
train = ratings.drop(test.index)

1. Select and Tune a Recommender Model From LensKit

In the first part of our exeriment, we want to get the best performing recommender model on the Movielens 100k dataset. This model should be selected from all LensKit’s algorihtms and tuned based on the NDCG@10 metric .

A LensKit developer simply calls the get_best_recommender_model() function to select and optimize a LensKit model.

Note: To keep the demo easily executable, we reduced the search time from one hour to two minutes. Two minutes are enough to demonstrate how LensKit-Auto works. In a real use case, we provide more time for the optimization process.

[4]:
from lkauto.lkauto import get_best_recommender_model

# call the get_best_recommender_model to automatically select and tune the best performing LensKit algorithm
optimized_model, configuration  = get_best_recommender_model(train=train, time_limit_in_sec=120)
2023-02-22 09:10:41,441 INFO ---Starting LensKit-Auto---
2023-02-22 09:10:41,442 INFO     optimization_time:              120 seconds
2023-02-22 09:10:41,442 INFO     num_evaluations:                        500
2023-02-22 09:10:41,442 INFO     optimization_metric:            ndcg@10
2023-02-22 09:10:41,442 INFO     optimization_strategie:         bayesian
2023-02-22 09:10:41,443 INFO --Start Preprocessing--
2023-02-22 09:10:41,445 INFO --End Preprocessing--
2023-02-22 09:10:41,445 INFO --Start Bayesian Optimization--
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
Numba is using threading layer omp - consider TBB
BLAS using multiple threads - can cause oversubscription
found 2 potential runtime problems - see https://boi.st/lkpy-perf
2023-02-22 09:10:52,835 INFO Run ID: 1 | ItemItem | ndcg@10: 0.17453138349348526
2023-02-22 09:10:56,469 INFO Run ID: 2 | UserUser | ndcg@10: 0.1644463000108411
2023-02-22 09:11:18,025 INFO Run ID: 3 | FunkSVD | ndcg@10: 0.00011664492811850631
2023-02-22 09:11:21,827 INFO Run ID: 4 | BiasedSVD | ndcg@10: 0.04715050990079625
2023-02-22 09:11:28,736 INFO Run ID: 5 | ALSBiasedMF | ndcg@10: 0.04182150616315787
2023-02-22 09:11:29,933 INFO Run ID: 6 | Bias | ndcg@10: 0.04589156840426328
2023-02-22 09:11:32,253 INFO Run ID: 7 | FunkSVD | ndcg@10: 0.012443371569372778
2023-02-22 09:11:37,465 INFO Run ID: 8 | ItemItem | ndcg@10: 0.1743543283803561
2023-02-22 09:11:42,885 INFO Run ID: 9 | ItemItem | ndcg@10: nan
Target Algorithm returned NaN or inf as quality. Algorithm run is treated as CRASHED, cost is set to 2147483647.0 for quality scenarios. (Change value through "cost_for_crash"-option.)
2023-02-22 09:11:48,673 INFO Run ID: 10 | ItemItem | ndcg@10: 0.17132813190244398
2023-02-22 09:11:54,037 INFO Run ID: 11 | ItemItem | ndcg@10: 0.17453138349348526
2023-02-22 09:11:59,507 INFO Run ID: 12 | ItemItem | ndcg@10: 0.17453138349348526
2023-02-22 09:12:04,935 INFO Run ID: 13 | ItemItem | ndcg@10: 0.17453138349348526
2023-02-22 09:12:10,385 INFO Run ID: 14 | ItemItem | ndcg@10: 0.17437296326506613
2023-02-22 09:12:14,329 INFO Run ID: 15 | BiasedSVD | ndcg@10: 0.0005927618599612518
2023-02-22 09:12:19,833 INFO Run ID: 16 | ItemItem | ndcg@10: 0.17453138349348526
2023-02-22 09:12:21,384 INFO Run ID: 17 | Bias | ndcg@10: 0.042598050044249776
2023-02-22 09:12:26,883 INFO Run ID: 18 | ItemItem | ndcg@10: 0.17551463536785386
2023-02-22 09:12:31,520 INFO Run ID: 19 | ItemItem | ndcg@10: 0.17499128397177383
2023-02-22 09:12:34,813 INFO Run ID: 20 | UserUser | ndcg@10: nan
Target Algorithm returned NaN or inf as quality. Algorithm run is treated as CRASHED, cost is set to 2147483647.0 for quality scenarios. (Change value through "cost_for_crash"-option.)
2023-02-22 09:12:40,488 INFO Run ID: 21 | ItemItem | ndcg@10: 0.17453138349348526
2023-02-22 09:12:46,453 INFO Run ID: 22 | ItemItem | ndcg@10: nan
Target Algorithm returned NaN or inf as quality. Algorithm run is treated as CRASHED, cost is set to 2147483647.0 for quality scenarios. (Change value through "cost_for_crash"-option.)
2023-02-22 09:12:46,461 INFO --End Bayesian Optimization--
2023-02-22 09:12:46,461 INFO --Start Postrprocessing--
2023-02-22 09:12:46,462 INFO --Best Model--
2023-02-22 09:12:46,462 INFO {'algo': 'ItemItem', 'ItemItem:min_nbrs': 10, 'ItemItem:min_sim': 0.0016910967954253439, 'ItemItem:nnbrs': 9043}
2023-02-22 09:12:46,462 INFO ---LensKit-Auto finished---

After this step, the LensKit developer uses the optimized model as any other LensKit model. The following lines are copied from the Running the Evaluation part of the LensKit Getting Started Chapter

First, we initialize a LensKit Recommender with our optimized model. Then, we fit and predict on the train - and test set.

[5]:
from lenskit import batch, topn, util
from lenskit.algorithms import Recommender

# initialize LensKit Recommender object
fittable = Recommender.adapt(optimized_model)
# fit the optimized model
fittable.fit(train)
users = test.user.unique()
# now we run the recommender
recs = batch.recommend(fittable, users, 10)

In the last step of the evaluation, we use the LensKit Top-N RecListAnalysis object to compute the NDCG@10 metric for every user.

[6]:
# initialize RecSysAnalysis object for computing the NDCG@10 value
rla = topn.RecListAnalysis()
# add ndcg metric to the RecSysAnalysis tool
rla.add_metric(topn.ndcg)
# compute ndcg@10 values
results = rla.compute(recs, test)
# show the ndcg scores per user
results.head()
[6]:
nrecs ndcg
user
877 10 0.147429
815 10 0.117370
94 10 0.220307
416 10 0.238576
500 10 0.177129

2. Tune and Ensemble a Predictor Model From LensKit

In the second part of this demo, we are going to tune and ensemble the BiasedMatrixFactorization algorithm with LensKit-Auto. In comparison to the first part of the demo, we don’t want to select an algorithm out of all LensKit algorithms but tune a single predictor algorithm on the RMSE metric. Furthermore, we want to ensemble the best performing models to gain a performance boost.

In this part of the demo we, need to create a configuration space that only contains the BiasedMatrixFactorization algorithm.

Note: To keep the demo easily executable, we reduced the search time from one hour to two minutes. Two minutes are enough to demonstrate how LensKit-Auto works. In a real use case, we provide more time for the optimization process.

[4]:
from ConfigSpace import Constant
from lkauto.algorithms.als import BiasedMF

# initialize BiasedMF ConfigurationSpace
cs = BiasedMF.get_default_configspace()
# declare, that the BiasedMF algorithm is the only algorithm contained in the configuration space
cs.add_hyperparameters([Constant("algo", "ALSBiasedMF")])
# set a random seed for reproducible results
cs.seed(42)

After we created the configuration space for the BiasedMatrixFactorization algorithm. We call the get_best_prediction_model to automatically tune and ensemble BiasedMatrixFactorization models.

[7]:
from lkauto.lkauto import get_best_prediction_model
# Provide the BiasedMF ConfigurationSpace to the get_best_recommender_model function.
optimized_model, configuration = get_best_prediction_model(train=train, cs=cs, time_limit_in_sec=120)
2023-02-22 09:02:05,537 INFO ---Starting LensKit-Auto---
2023-02-22 09:02:05,539 INFO     optimization_time:              120 seconds
2023-02-22 09:02:05,541 INFO     num_evaluations:                        500
2023-02-22 09:02:05,541 INFO     optimization_metric:            rmse
2023-02-22 09:02:05,542 INFO     optimization_strategie:         bayesian
2023-02-22 09:02:05,543 INFO --Start Preprocessing--
2023-02-22 09:02:05,550 INFO --End Preprocessing--
2023-02-22 09:02:05,551 INFO --Start Bayesian Optimization--
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
Numba is using threading layer omp - consider TBB
BLAS using multiple threads - can cause oversubscription
found 2 potential runtime problems - see https://boi.st/lkpy-perf
2023-02-22 09:02:12,521 INFO Run ID: 1 | ALSBiasedMF | rmse: 0.9393121280796051
2023-02-22 09:02:19,757 INFO Run ID: 2 | ALSBiasedMF | rmse: 0.9631079867170781
2023-02-22 09:02:24,235 INFO Run ID: 3 | ALSBiasedMF | rmse: 0.951654284405548
2023-02-22 09:02:25,692 INFO Run ID: 4 | ALSBiasedMF | rmse: 0.9666164944266947
2023-02-22 09:02:28,658 INFO Run ID: 5 | ALSBiasedMF | rmse: 0.9394765406698431
2023-02-22 09:02:40,558 INFO Run ID: 6 | ALSBiasedMF | rmse: 0.9482395058989359
2023-02-22 09:02:44,966 INFO Run ID: 7 | ALSBiasedMF | rmse: 0.9393212729519723
2023-02-22 09:02:48,386 INFO Run ID: 8 | ALSBiasedMF | rmse: 0.9393622572445268
2023-02-22 09:02:49,495 INFO Run ID: 9 | ALSBiasedMF | rmse: 0.9695151963076567
2023-02-22 09:02:51,329 INFO Run ID: 10 | ALSBiasedMF | rmse: 0.9605555908108012
2023-02-22 09:02:57,446 INFO Run ID: 11 | ALSBiasedMF | rmse: 0.939296574209202
2023-02-22 09:03:00,865 INFO Run ID: 12 | ALSBiasedMF | rmse: 0.9477068515411664
2023-02-22 09:03:08,486 INFO Run ID: 13 | ALSBiasedMF | rmse: 0.9393445899560989
2023-02-22 09:03:10,267 INFO Run ID: 14 | ALSBiasedMF | rmse: 0.9538055651010438
2023-02-22 09:03:11,675 INFO Run ID: 15 | ALSBiasedMF | rmse: 1.0025167074259034
2023-02-22 09:03:16,042 INFO Run ID: 16 | ALSBiasedMF | rmse: 0.939324998919378
2023-02-22 09:03:20,541 INFO Run ID: 17 | ALSBiasedMF | rmse: 0.9393076995768875
2023-02-22 09:03:24,700 INFO Run ID: 18 | ALSBiasedMF | rmse: 0.9480222221845809
2023-02-22 09:03:29,203 INFO Run ID: 19 | ALSBiasedMF | rmse: 0.9393125578143455
2023-02-22 09:03:33,582 INFO Run ID: 20 | ALSBiasedMF | rmse: 0.9393075106088427
2023-02-22 09:03:37,978 INFO Run ID: 21 | ALSBiasedMF | rmse: 0.9410654295477805
2023-02-22 09:03:42,375 INFO Run ID: 22 | ALSBiasedMF | rmse: 0.9392939607845919
2023-02-22 09:03:46,197 INFO Run ID: 23 | ALSBiasedMF | rmse: 0.9393204026298504
2023-02-22 09:03:50,599 INFO Run ID: 24 | ALSBiasedMF | rmse: 0.9393363169889227
2023-02-22 09:03:56,905 INFO Run ID: 25 | ALSBiasedMF | rmse: 0.9392963259189987
2023-02-22 09:03:58,425 INFO Run ID: 26 | ALSBiasedMF | rmse: 1.0525937791525835
2023-02-22 09:03:59,903 INFO Run ID: 27 | ALSBiasedMF | rmse: 1.00205812315906
2023-02-22 09:04:04,284 INFO Run ID: 28 | ALSBiasedMF | rmse: 0.9393266925132977
2023-02-22 09:04:08,701 INFO Run ID: 29 | ALSBiasedMF | rmse: 0.9392937216538252
2023-02-22 09:04:08,706 INFO --End Bayesian Optimization--
2023-02-22 09:04:08,712 INFO --Start Postrprocessing--
2023-02-22 09:04:09,328 INFO --Best Model--
2023-02-22 09:04:09,328 INFO GES Ensemble Model
2023-02-22 09:04:09,328 INFO ---LensKit-Auto finished---

After we have the optimized and ensembled BiasedMatrixFactorization models. We can use the ensemble like any other LensKit predictor model to get predictions.

[9]:
# fit the optimized model
optimized_model.fit(train)
# predict using the optimized model returned by LensKit-auto
preds = optimized_model.predict(test)

In the last step of this demo, we calculate the RMSE value for our optimized model.

[10]:
from lenskit.metrics.predict import rmse

# print the RMSE value
print("RMSE: {}".format(rmse(predictions=preds, truth=test['rating'])))

RMSE: 0.9077549336005521