lkauto.utils package¶

Submodules¶

lkauto.utils.filer module¶

class lkauto.utils.filer.Filer(output_directory_path='output/')¶

Bases: object

Filer to handle the LensKit-Auto output

This filer supports to structure the LensKit-Auto output in the file system

output_directory_path¶

path that leads to the output directory of LensKit-Auto

Type: path to the output directory

get_output_directory_path() → str¶

get_smac_output_directory_path() → str¶

set_output_directory_path(output_directory_path: str) → None¶

save_dataframe_as_csv(self, dataframe: pd.DataFrame, output_path: str, name: str) → None¶

save_dictionary_to_json(self, dictionary: dict, output_path: str, name: str) → None¶

save_metric_scores_to_txt(self, metric_scores: np.array, output_path: str, name: str) → None:¶

get_dataframe_from_csv(self, path_to_file: str, index_column=None) → pd.DataFrame:¶

get_series_from_csv(self, path_to_file: str, index_column=None) → pd.Series¶

get_dict_from_json_file(self, path_to_file: str) → dict¶

get_numpy_array_from_txt_file(self, path_to_file: str) → np.array¶

append_dataframe_to_csv(self, dataframe: pd.DataFrame, output_path: str, name: str) → None¶

save_validataion_data(self,: config_space: ConfigurationSpace, predictions: pd.DataFrame, metric_scores: np.array, output_path: str, run_id: int) -> None

get_output_directory_path() → str

get_smac_output_directory_path() → str

set_output_directory_path(output_directory_path: str) → None

save_dataframe_as_csv(dataframe: pandas.core.frame.DataFrame, output_path: str, name: str) → None

save_dictionary_to_json(dictionary: dict, output_path: str, name: str) → None

save_metric_scores_to_txt(metric_scores: numpy.array, output_path: str, name: str) → None

get_dataframe_from_csv(path_to_file: str, index_column=None) → pandas.core.frame.DataFrame

get_series_from_csv(path_to_file: str, index_column=None) → pandas.core.series.Series

get_dict_from_json_file(path_to_file: str) → dict

get_numpy_array_from_txt_file(path_to_file: str) → numpy.array

append_dataframe_to_csv(dataframe: pandas.core.frame.DataFrame, output_path: str, name: str) → None

save_validataion_data(config_space: ConfigSpace.configuration_space.ConfigurationSpace, predictions: pandas.core.frame.DataFrame, metric_scores: numpy.array, output_path: str, run_id: int) → None¶

method to simplify the validation data output

The validation data output is mainly used to build ensambles of the best models. Therefore, predictions, error scores and configuration spaces of each run need to be stored to the file system

Parameters

config_space (ConfigurationSpace) – configuraiton space of run
predictions (pd.Dataframe) – dataframe containing raw predictions
metric_scores (np.array) – numpy array containing metric values Depending on the metric, differnet kind of metric values can be stored in the metric scores array
output_path (str) – path to output folder
run_id (int) – id of smac search iteration

lkauto.utils.get_default_configuration_space module¶

lkauto.utils.get_default_configuration_space.get_default_configuration_space(data: pandas.core.frame.DataFrame, val_fold_indices, feedback: str, validation: pandas.core.frame.DataFrame = None, random_state=42) → ConfigSpace.configuration_space.ConfigurationSpace¶

returns the default configuration space for all included rating predictions algorithms

Parameters

data (pd.DataFrame) – data to use
val_fold_indices – validation fold indices
validation (pd.DataFrame) – validation data (provided by user)
feedback (str) – feedback type, either ‘explicit’ or ‘implicit’
random_state (int) – random state to use

lkauto.utils.get_default_configurations module¶

lkauto.utils.get_default_configurations.get_default_configurations()¶

returns a list of default configurations for all algorithms in the configuration space

Parameters: config_space (ConfigurationSpace) – configuration space to use
Returns
Return type: List[Configuration]

lkauto.utils.get_model_from_cs module¶

lkauto.utils.get_model_from_cs.get_model_from_cs(cs: ConfigSpace.configuration_space.ConfigurationSpace, feedback: str, fallback_model=<lenskit.algorithms.bias.Bias object>, random_state: int = 42) → Union[lenskit.algorithms.Recommender, lenskit.algorithms.Predictor]¶

builds a Predictor model defined in ConfigurationSpace

Parameters

cs (ConfigurationSpace) – configuration space containing information to build a model
feedback (str) – feedback type, either ‘explicit’ or ‘implicit’
fallback_model (Predictor) – fallback algorithm to use in case of missing values
random_state (int) – random state to use

Returns

fallback_algo – Predictor build with the config_space information

Return type

Predictor

lkauto.utils.logging module¶

lkauto.utils.logging.get_logger(name: str = 'lenskit-auto', level: str = 20) → logging.Logger¶

Returns a logger with the given name and level.

Parameters

name (str) – name of the logger
level (int) – level of the logger

Returns

logger with the given name and level

Return type

logging.Logger

lkauto.utils.update_top_n_runs module¶

lkauto.utils.update_top_n_runs.update_top_n_runs(num_models: int, top_n_runs: pandas.core.frame.DataFrame, run_id: int, config_space: ConfigSpace.configuration_space.ConfigurationSpace, errors)¶

updates the top n runs dataframe with the new run

Parameters

num_models (int) – number of models to keep track of
top_n_runs (pd.DataFrame) – pandas dataframe containing the top n runs of the optimization method.
run_id (int) – run id of the new run
config_space (ConfigurationSpace) – configuration space of the new run
errors (np.ndarray) – errors of the new run

Returns

top_n_runs – pandas dataframe containing the top n runs of the optimization method.

Return type

pd.DataFrame

lkauto.utils.validation_split module¶

lkauto.utils.validation_split.validation_split(data: pandas.core.frame.DataFrame, strategie: str = 'user_based', num_folds: int = 1, frac: float = 0.25, random_state=42) → dict¶

Returns a dictionary with the indices of the train and validation split for the given data. The dictionary has the following structure: {

0: { # fold 0
“train”: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], “validation”: [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

}, 1: { # fold 1

“train”: [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], “validation”: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

}

}

Parameters

data (pd.DataFrame) – Pandas Dataframe with the data to be split.
strategie (str) – cross validation strategie (user_based or row_based)
num_folds (int) – number of folds for the validation split cross validation
frac (float) – fraction of the dataset to be used for the validation split. If num_folds > 1, the fraction value will be ignored.
random_state (int) – random state for the validation split

Returns

dictionary with the indices of the train and validation split for the given data.

Return type

dict

lkauto.utils.validation_split.row_based_validation_split(data: pandas.core.frame.DataFrame, num_folds: int = 1, frac: float = 0.25, random_state=42) → dict¶

Returns a dictionary with the indices of the train and validation split for the given data. The dictionary has the following structure: {

0: { # fold 0
“train”: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], “validation”: [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

}, 1: { # fold 1

“train”: [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], “validation”: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

}

}

Parameters

data (pd.DataFrame) – Pandas Dataframe with the data to be split.
num_folds (int) – number of folds for the validation split cross validation
frac (float) – fraction of the dataset to be used for the validation split. If num_folds > 1, the fraction value will be ignored.
random_state (int) – random state for the validation split

Returns

dictionary with the indices of the train and validation split for the given data.

Return type

dict

lkauto.utils.validation_split.user_based_validation_split(data: pandas.core.frame.DataFrame, num_folds: int = 1, frac: float = 0.25, random_state=42) → dict¶

Returns a dictionary with the indices of the train and validation split for the given data. The dictionary has the following structure: {

0: { # fold 0
“train”: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], “validation”: [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

}, 1: { # fold 1

“train”: [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], “validation”: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

}

}

Parameters

data (pd.DataFrame) – Pandas Dataframe with the data to be split.
num_folds (int) – number of folds for the validation split cross validation
frac (float) – fraction of the dataset to be used for the validation split. If num_folds > 1, the fraction value will be ignored.
random_state (int) – random state for the validation split

Returns

dictionary with the indices of the train and validation split for the given data.

Return type

dict

lkauto.utils package¶

Submodules¶

lkauto.utils.filer module¶

lkauto.utils.get_default_configuration_space module¶

lkauto.utils.get_default_configurations module¶

lkauto.utils.get_model_from_cs module¶

lkauto.utils.logging module¶

lkauto.utils.update_top_n_runs module¶

lkauto.utils.validation_split module¶

Module contents¶