lkauto package

lkauto.lkauto.get_best_prediction_model()

returns the best Predictor found in the defined search time

the find_best_explicit_configuration method will search the ConfigurationSpace for the best Predictor model configuration. Depending on the ConfigurationSpace parameter provided by the developer, performs three different use-cases. 1. combined algorithm selection and hyperparameter configuration 2. combined algorthm selection and hyperparameter configuration for a specific subset

of algorithms and/or different parameter ranges

3. hyperparameter selection for a specific algorithm. The hyperparameter and/or model selection process will be stopped after the time_limit_in_sec or (if set) after n_trials. The first one to be reached will stop the optimization.

Parameters
  • train (pd.DataFrame) – Pandas Dataframe train split.

  • validation (pd.DataFrame) – Pandas Dataframe validation split. if a validation split is provided, split_folds, split_strategy and split_frac will be ignored.

  • cs (ConfigurationSpace) – ConfigurationSpace with all algorithms and parameter ranges defined.

  • optimization_metric (function) – LensKit prediction accuracy metric to optimize for (either rmse or mae)

  • optimization_strategie (str) – optimization strategie to use. Either bayesian or random_search

  • time_limit_in_sec (int) – optimization search time limit in sec.

  • num_evaluations (int) – number of samples to be used for optimization_strategy. Value can not be smaller than 6 if no initial configuration is provided.

  • random_state – The random number generator or seed (see lenskit.util.rng()).

  • split_folds (int) – number of folds of the inner split

  • split_frac (float) – fraction of the inner split. If split_folds is to a value > 2, split_frac will be ignored. Value must be between 0 and 1.

  • split_strategie (str) – split strategie to use. Either ‘user_based’ or ‘item_based’

  • ensemble_size (int) – number of models to be used for ensemble building.

  • minimize_error_metric_val (bool) – if True, the optimization will try to minimize the error metric value. If False, the optimization will try to maximize the error metric value.

  • min_number_of_ratings (int) – minimum number of ratings a user must have to be considered in the train dataset.

  • max_number_of_ratings (int) – maximum number of ratings a user can have to be considered in the train dataset.

  • drop_duplicates (bool) – if True, all duplicate rows will be dropped from the train dataset.

  • drop_na_values (bool) – if True, all rows with NaN values will be dropped from the train dataset.

  • user_column (str) – name of the user column in the train dataset.

  • item_column (str) – name of the item column in the train dataset.

  • rating_column (str) – name of the rating column in the train dataset.

  • timestamp_col (str) – Name of the timestamp column

  • include_timestamp (bool = True) – If True, the timestamp column will be included in the dataset

  • log_level (str) – log level to use.

  • filer (Filer) – filer to manage LensKit-Auto output

Returns

  • model (Predictor) – the best suited (untrained) predictor for the train dataset, cs parameters.

  • incumbent (dict) – a dictionary containing the algorithm name and hyperparameter configuration of the returned model

lkauto.lkauto.get_best_recommender_model()

returns the best Recommender found in the defined search time

the find_best_implicit_configuration method will search the ConfigurationSpace for the best Recommender model configuration. Depending on the ConfigurationSpace parameter provided by the developer, performs three different use-cases. 1. combined algorithm selection and hyperparameter configuration 2. combined algorthm selection and hyperparameter configuration for a specific subset

of algorithms and/or different parameter ranges

3. hyperparameter selection for a specific algorithm. The hyperparameter and/or model selection process will be stopped after the time_limit_in_sec or (if set) after n_trials. The first one to be reached will stop the optimization.

Parameters
  • train (pd.DataFrame) – Pandas Dataframe train split.

  • validation (pd.DataFrame) – Pandas Dataframe validation split. if a validation split is provided, split_folds, split_strategy and split_frac will be ignored.

  • cs (ConfigurationSpace) – ConfigurationSpace with all algorithms and parameter ranges defined.

  • optimization_strategie (str) – optimization strategie to use. Either bayesian or random_search

  • optimization_metric (function) – LensKit recommender metric to optimize for

  • time_limit_in_sec (int) – search time limit.

  • num_evaluations (int) – number of samples to be used for optimization_strategy. Value can not be smaller than 6

  • random_state – The random number generator or seed (see lenskit.util.rng()).

  • split_folds (int) – number of folds of the inner split

  • split_frac (float) – fraction of the inner split. If split_folds is not None, split_frac will be ignored. Value must be between 0 and 1.

  • split_strategie (str) – split strategie to use. Either ‘user_based’ or ‘item_based’.

  • minimize_error_metric_val (bool) – if True, the optimization_metric value will be minimized. If False, the optimization_metric value will be maximized.

  • num_recommendations (int) – number of recommendations to be evaluted per user. Value must be greater than 0.

  • min_interactions_per_user (int) – minimum number of ratings a user must have to be considered in the train dataset.

  • max_interactions_per_user (int) – maximum number of ratings a user can have to be considered in the train dataset.

  • drop_duplicates (bool) – if True, all duplicate rows will be dropped from the train dataset.

  • drop_na_values (bool) – if True, all rows with NaN values will be dropped from the train dataset.

  • user_column (str) – name of the user column in the train dataset.

  • item_column (str) – name of the item column in the train dataset.

  • rating_column (str) – name of the rating column in the train dataset.

  • timestamp_col (str) – Name of the timestamp column

  • include_timestamp (bool = True) – If True, the timestamp column will be included in the dataset

  • log_level (str) – log level to use.

  • filer (Filer) – filer to manage LensKit-Auto output

Returns

  • model (Predictor) – the best suited (untrained) predictor for the train dataset, cs parameters.

  • incumbent (dict) – a dictionary containing the algorithm name and hyperparameter configuration of the returned model