lkauto.utils package¶
Submodules¶
lkauto.utils.filer module¶
-
class
lkauto.utils.filer.Filer(output_directory_path='output/')¶ Bases:
objectFiler to handle the LensKit-Auto output
This filer supports to structure the LensKit-Auto output in the file system
-
output_directory_path¶ path that leads to the output directory of LensKit-Auto
- Type
path to the output directory
-
get_output_directory_path() → str¶
-
get_smac_output_directory_path() → str¶
-
set_output_directory_path(output_directory_path: str) → None¶
-
save_dataframe_as_csv(self, dataframe: pd.DataFrame, output_path: str, name: str) → None¶
-
save_dictionary_to_json(self, dictionary: dict, output_path: str, name: str) → None¶
-
save_metric_scores_to_txt(self, metric_scores: np.array, output_path: str, name: str) → None:¶
-
get_dataframe_from_csv(self, path_to_file: str, index_column=None) → pd.DataFrame:¶
-
get_series_from_csv(self, path_to_file: str, index_column=None) → pd.Series¶
-
get_dict_from_json_file(self, path_to_file: str) → dict¶
-
get_numpy_array_from_txt_file(self, path_to_file: str) → np.array¶
-
append_dataframe_to_csv(self, dataframe: pd.DataFrame, output_path: str, name: str) → None¶
-
save_validataion_data(self, config_space: ConfigurationSpace, predictions: pd.DataFrame, metric_scores: np.array, output_path: str, run_id: int) -> None
-
get_output_directory_path() → str
-
get_smac_output_directory_path() → str
-
set_output_directory_path(output_directory_path: str) → None
-
save_dataframe_as_csv(dataframe: pandas.core.frame.DataFrame, output_path: str, name: str) → None
-
save_dictionary_to_json(dictionary: dict, output_path: str, name: str) → None
-
save_metric_scores_to_txt(metric_scores: numpy.array, output_path: str, name: str) → None
-
get_dataframe_from_csv(path_to_file: str, index_column=None) → pandas.core.frame.DataFrame
-
get_series_from_csv(path_to_file: str, index_column=None) → pandas.core.series.Series
-
get_dict_from_json_file(path_to_file: str) → dict
-
get_numpy_array_from_txt_file(path_to_file: str) → numpy.array
-
append_dataframe_to_csv(dataframe: pandas.core.frame.DataFrame, output_path: str, name: str) → None
-
save_validataion_data(config_space: ConfigSpace.configuration_space.ConfigurationSpace, predictions: pandas.core.frame.DataFrame, metric_scores: numpy.array, output_path: str, run_id: int) → None¶ method to simplify the validation data output
The validation data output is mainly used to build ensambles of the best models. Therefore, predictions, error scores and configuration spaces of each run need to be stored to the file system
- Parameters
config_space (ConfigurationSpace) – configuraiton space of run
predictions (pd.Dataframe) – dataframe containing raw predictions
metric_scores (np.array) – numpy array containing metric values Depending on the metric, differnet kind of metric values can be stored in the metric scores array
output_path (str) – path to output folder
run_id (int) – id of smac search iteration
-
lkauto.utils.get_default_configuration_space module¶
-
lkauto.utils.get_default_configuration_space.get_default_configuration_space(data: pandas.core.frame.DataFrame, val_fold_indices, feedback: str, validation: pandas.core.frame.DataFrame = None, random_state=42) → ConfigSpace.configuration_space.ConfigurationSpace¶ returns the default configuration space for all included rating predictions algorithms
lkauto.utils.get_default_configurations module¶
-
lkauto.utils.get_default_configurations.get_default_configurations()¶ returns a list of default configurations for all algorithms in the configuration space
- Parameters
config_space (ConfigurationSpace) – configuration space to use
- Returns
- Return type
List[Configuration]
lkauto.utils.get_model_from_cs module¶
-
lkauto.utils.get_model_from_cs.get_model_from_cs(cs: ConfigSpace.configuration_space.ConfigurationSpace, feedback: str, fallback_model=<lenskit.algorithms.bias.Bias object>, random_state: int = 42) → Union[lenskit.algorithms.Recommender, lenskit.algorithms.Predictor]¶ builds a Predictor model defined in ConfigurationSpace
- Parameters
- Returns
fallback_algo – Predictor build with the config_space information
- Return type
Predictor
lkauto.utils.logging module¶
-
lkauto.utils.logging.get_logger(name: str = 'lenskit-auto', level: str = 20) → logging.Logger¶ Returns a logger with the given name and level.
- Parameters
- Returns
logger with the given name and level
- Return type
lkauto.utils.update_top_n_runs module¶
-
lkauto.utils.update_top_n_runs.update_top_n_runs(num_models: int, top_n_runs: pandas.core.frame.DataFrame, run_id: int, config_space: ConfigSpace.configuration_space.ConfigurationSpace, errors)¶ updates the top n runs dataframe with the new run
- Parameters
num_models (int) – number of models to keep track of
top_n_runs (pd.DataFrame) – pandas dataframe containing the top n runs of the optimization method.
run_id (int) – run id of the new run
config_space (ConfigurationSpace) – configuration space of the new run
errors (np.ndarray) – errors of the new run
- Returns
top_n_runs – pandas dataframe containing the top n runs of the optimization method.
- Return type
pd.DataFrame
lkauto.utils.validation_split module¶
-
lkauto.utils.validation_split.validation_split(data: pandas.core.frame.DataFrame, strategie: str = 'user_based', num_folds: int = 1, frac: float = 0.25, random_state=42) → dict¶ Returns a dictionary with the indices of the train and validation split for the given data. The dictionary has the following structure: {
- 0: { # fold 0
“train”: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], “validation”: [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
}, 1: { # fold 1
“train”: [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], “validation”: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
}
}
- Parameters
data (pd.DataFrame) – Pandas Dataframe with the data to be split.
strategie (str) – cross validation strategie (user_based or row_based)
num_folds (int) – number of folds for the validation split cross validation
frac (float) – fraction of the dataset to be used for the validation split. If num_folds > 1, the fraction value will be ignored.
random_state (int) – random state for the validation split
- Returns
dictionary with the indices of the train and validation split for the given data.
- Return type
-
lkauto.utils.validation_split.row_based_validation_split(data: pandas.core.frame.DataFrame, num_folds: int = 1, frac: float = 0.25, random_state=42) → dict¶ Returns a dictionary with the indices of the train and validation split for the given data. The dictionary has the following structure: {
- 0: { # fold 0
“train”: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], “validation”: [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
}, 1: { # fold 1
“train”: [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], “validation”: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
}
}
- Parameters
data (pd.DataFrame) – Pandas Dataframe with the data to be split.
num_folds (int) – number of folds for the validation split cross validation
frac (float) – fraction of the dataset to be used for the validation split. If num_folds > 1, the fraction value will be ignored.
random_state (int) – random state for the validation split
- Returns
dictionary with the indices of the train and validation split for the given data.
- Return type
-
lkauto.utils.validation_split.user_based_validation_split(data: pandas.core.frame.DataFrame, num_folds: int = 1, frac: float = 0.25, random_state=42) → dict¶ Returns a dictionary with the indices of the train and validation split for the given data. The dictionary has the following structure: {
- 0: { # fold 0
“train”: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], “validation”: [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
}, 1: { # fold 1
“train”: [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], “validation”: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
}
}
- Parameters
data (pd.DataFrame) – Pandas Dataframe with the data to be split.
num_folds (int) – number of folds for the validation split cross validation
frac (float) – fraction of the dataset to be used for the validation split. If num_folds > 1, the fraction value will be ignored.
random_state (int) – random state for the validation split
- Returns
dictionary with the indices of the train and validation split for the given data.
- Return type