lkauto.optimization_strategies package¶
Submodules¶
lkauto.optimization_strategies.bayesian_optimization module¶
-
lkauto.optimization_strategies.bayesian_optimization.bayesian_optimization()¶ returns the best configuration found by bayesian optimization. The bayesian_optimization method will use SMAC3 to find the best performing configuration for the given train split. The ConfigurationSpace can consist of hyperparameters for a single algorithm or a combination of algorithms.
- Parameters
train (pd.DataFrame) – Pandas Dataframe outer train split.
validation (pd.DataFrame) – Pandas Dataframe validation split.
cs (ConfigurationSpace) – ConfigurationSpace with all algorithms and hyperparameter ranges defined.
time_limit_in_sec (int) – time limit in seconds for the optimization process
num_evaluations (int) – number of samples to be drawn from the ConfigurationSpace
optimization_metric (function) – LensKit prediction accuracy metric to optimize for.
minimize_error_metric_val (bool) – Bool that decides if the error metric should be minimized or maximized.
user_feedback (str) – Defines if the dataset contains explicit or implicit feedback.
random_state (int) –
split_folds (int) – number of folds for cross validation
split_strategie (str) – cross validation strategie (user_based or item_based)
split_frac (float) – fraction of the dataset to be used for the validation split. If num_folds > 1, the fraction value will be ignored.
ensemble_size (int) – number of models to be used in the ensemble of rating prediction tasks. This value will be ignored for recommender tasks.
num_recommendations (int) – number of recommendations to be made for each user. This value will be ignored for rating prediction.
filer (Filer) – filer to manage LensKit-Auto output
- Returns
incumbent (Configuration) – best configuration found by bayesian optimization
top_n_runs (pd.DataFrame) – top n runs found by bayesian optimization
lkauto.optimization_strategies.random_search module¶
-
lkauto.optimization_strategies.random_search.random_search()¶ returns the best configuration found by random search
The random_search method will randomly search through the ConfigurationSpace to find the best performing configuration for the given train split. The ConfigurationSpace can consist of hyperparameters for a single algorithm or a combination of algorithms.
- Parameters
train (pd.DataFrame) – Pandas Dataframe train split.
cs (ConfigurationSpace) – ConfigurationSpace with all algorithms and hyperparameter ranges defined.
num_evaluations (int) – number of samples to be randomly drawn from the ConfigurationSpace
optimization_metric (function) – LensKit prediction accuracy metric to optimize for (either rmse or mae)
minimize_error_metric_val (bool) – Bool that decides if the error metric should be minimized or maximized.
user_feedback (str) – Defines if the dataset contains explicit or implicit feedback.
random_state (int) –
filer (Filer) – filer to manage LensKit-Auto output
validation (pd.DataFrame) – Pandas Dataframe validation split.
time_limit_in_sec – time limit in seconds for the optimization process
split_folds (int) – number of folds for the validation split cross validation
split_strategie (str) – cross validation strategie (user_based or row_based)
split_frac (float) – fraction of the dataset to be used for the validation split. If num_folds > 1, the fraction value will be ignored.
ensemble_size (int) – number of models to be used in the ensemble of rating prediction tasks. This value will be ignored for recommender tasks.
num_recommendations (int) – number of recommendations to be made for each user. This value will be ignored for rating prediction.
- Returns
best_configuration (Configuration) – the best suited (algorithm and) hyperparameter configuration for the train dataset.
top_n_runs (pd.DataFrame) – pandas dataframe containing the top n runs of the random search.