lkauto package¶
-
lkauto.lkauto.get_best_prediction_model()¶ returns the best Predictor found in the defined search time
the find_best_explicit_configuration method will search the ConfigurationSpace for the best Predictor model configuration. Depending on the ConfigurationSpace parameter provided by the developer, performs three different use-cases. 1. combined algorithm selection and hyperparameter configuration 2. combined algorthm selection and hyperparameter configuration for a specific subset
of algorithms and/or different parameter ranges
3. hyperparameter selection for a specific algorithm. The hyperparameter and/or model selection process will be stopped after the time_limit_in_sec or (if set) after n_trials. The first one to be reached will stop the optimization.
- Parameters
train (pd.DataFrame) – Pandas Dataframe train split.
validation (pd.DataFrame) – Pandas Dataframe validation split. if a validation split is provided, split_folds, split_strategy and split_frac will be ignored.
cs (ConfigurationSpace) – ConfigurationSpace with all algorithms and parameter ranges defined.
optimization_metric (function) – LensKit prediction accuracy metric to optimize for (either rmse or mae)
optimization_strategie (str) – optimization strategie to use. Either bayesian or random_search
time_limit_in_sec (int) – optimization search time limit in sec.
num_evaluations (int) – number of samples to be used for optimization_strategy. Value can not be smaller than 6 if no initial configuration is provided.
random_state – The random number generator or seed (see
lenskit.util.rng()).split_folds (int) – number of folds of the inner split
split_frac (float) – fraction of the inner split. If split_folds is to a value > 2, split_frac will be ignored. Value must be between 0 and 1.
split_strategie (str) – split strategie to use. Either ‘user_based’ or ‘item_based’
ensemble_size (int) – number of models to be used for ensemble building.
minimize_error_metric_val (bool) – if True, the optimization will try to minimize the error metric value. If False, the optimization will try to maximize the error metric value.
min_number_of_ratings (int) – minimum number of ratings a user must have to be considered in the train dataset.
max_number_of_ratings (int) – maximum number of ratings a user can have to be considered in the train dataset.
drop_duplicates (bool) – if True, all duplicate rows will be dropped from the train dataset.
drop_na_values (bool) – if True, all rows with NaN values will be dropped from the train dataset.
user_column (str) – name of the user column in the train dataset.
item_column (str) – name of the item column in the train dataset.
rating_column (str) – name of the rating column in the train dataset.
timestamp_col (str) – Name of the timestamp column
include_timestamp (bool = True) – If True, the timestamp column will be included in the dataset
log_level (str) – log level to use.
filer (Filer) – filer to manage LensKit-Auto output
- Returns
model (Predictor) – the best suited (untrained) predictor for the train dataset, cs parameters.
incumbent (dict) – a dictionary containing the algorithm name and hyperparameter configuration of the returned model
-
lkauto.lkauto.get_best_recommender_model()¶ returns the best Recommender found in the defined search time
the find_best_implicit_configuration method will search the ConfigurationSpace for the best Recommender model configuration. Depending on the ConfigurationSpace parameter provided by the developer, performs three different use-cases. 1. combined algorithm selection and hyperparameter configuration 2. combined algorthm selection and hyperparameter configuration for a specific subset
of algorithms and/or different parameter ranges
3. hyperparameter selection for a specific algorithm. The hyperparameter and/or model selection process will be stopped after the time_limit_in_sec or (if set) after n_trials. The first one to be reached will stop the optimization.
- Parameters
train (pd.DataFrame) – Pandas Dataframe train split.
validation (pd.DataFrame) – Pandas Dataframe validation split. if a validation split is provided, split_folds, split_strategy and split_frac will be ignored.
cs (ConfigurationSpace) – ConfigurationSpace with all algorithms and parameter ranges defined.
optimization_strategie (str) – optimization strategie to use. Either bayesian or random_search
optimization_metric (function) – LensKit recommender metric to optimize for
time_limit_in_sec (int) – search time limit.
num_evaluations (int) – number of samples to be used for optimization_strategy. Value can not be smaller than 6
random_state – The random number generator or seed (see
lenskit.util.rng()).split_folds (int) – number of folds of the inner split
split_frac (float) – fraction of the inner split. If split_folds is not None, split_frac will be ignored. Value must be between 0 and 1.
split_strategie (str) – split strategie to use. Either ‘user_based’ or ‘item_based’.
minimize_error_metric_val (bool) – if True, the optimization_metric value will be minimized. If False, the optimization_metric value will be maximized.
num_recommendations (int) – number of recommendations to be evaluted per user. Value must be greater than 0.
min_interactions_per_user (int) – minimum number of ratings a user must have to be considered in the train dataset.
max_interactions_per_user (int) – maximum number of ratings a user can have to be considered in the train dataset.
drop_duplicates (bool) – if True, all duplicate rows will be dropped from the train dataset.
drop_na_values (bool) – if True, all rows with NaN values will be dropped from the train dataset.
user_column (str) – name of the user column in the train dataset.
item_column (str) – name of the item column in the train dataset.
rating_column (str) – name of the rating column in the train dataset.
timestamp_col (str) – Name of the timestamp column
include_timestamp (bool = True) – If True, the timestamp column will be included in the dataset
log_level (str) – log level to use.
filer (Filer) – filer to manage LensKit-Auto output
- Returns
model (Predictor) – the best suited (untrained) predictor for the train dataset, cs parameters.
incumbent (dict) – a dictionary containing the algorithm name and hyperparameter configuration of the returned model