lkauto.utils package

Submodules

lkauto.utils.filer module

class lkauto.utils.filer.Filer(output_directory_path='output/')

Bases: object

Filer to handle the LensKit-Auto output

This filer supports to structure the LensKit-Auto output in the file system

output_directory_path

path that leads to the output directory of LensKit-Auto

Type

path to the output directory

get_output_directory_path() → str
get_smac_output_directory_path() → str
set_output_directory_path(output_directory_path: str) → None
save_dataframe_as_csv(self, dataframe: pd.DataFrame, output_path: str, name: str) → None
save_dictionary_to_json(self, dictionary: dict, output_path: str, name: str) → None
save_metric_scores_to_txt(self, metric_scores: np.array, output_path: str, name: str) → None:
get_dataframe_from_csv(self, path_to_file: str, index_column=None) → pd.DataFrame:
get_series_from_csv(self, path_to_file: str, index_column=None) → pd.Series
get_dict_from_json_file(self, path_to_file: str) → dict
get_numpy_array_from_txt_file(self, path_to_file: str) → np.array
append_dataframe_to_csv(self, dataframe: pd.DataFrame, output_path: str, name: str) → None
save_validataion_data(self,

config_space: ConfigurationSpace, predictions: pd.DataFrame, metric_scores: np.array, output_path: str, run_id: int) -> None

get_output_directory_path() → str
get_smac_output_directory_path() → str
set_output_directory_path(output_directory_path: str) → None
save_dataframe_as_csv(dataframe: pandas.core.frame.DataFrame, output_path: str, name: str) → None
save_dictionary_to_json(dictionary: dict, output_path: str, name: str) → None
save_metric_scores_to_txt(metric_scores: numpy.array, output_path: str, name: str) → None
get_dataframe_from_csv(path_to_file: str, index_column=None) → pandas.core.frame.DataFrame
get_series_from_csv(path_to_file: str, index_column=None) → pandas.core.series.Series
get_dict_from_json_file(path_to_file: str) → dict
get_numpy_array_from_txt_file(path_to_file: str) → numpy.array
append_dataframe_to_csv(dataframe: pandas.core.frame.DataFrame, output_path: str, name: str) → None
save_validataion_data(config_space: ConfigSpace.configuration_space.ConfigurationSpace, predictions: pandas.core.frame.DataFrame, metric_scores: numpy.array, output_path: str, run_id: int) → None

method to simplify the validation data output

The validation data output is mainly used to build ensambles of the best models. Therefore, predictions, error scores and configuration spaces of each run need to be stored to the file system

Parameters
  • config_space (ConfigurationSpace) – configuraiton space of run

  • predictions (pd.Dataframe) – dataframe containing raw predictions

  • metric_scores (np.array) – numpy array containing metric values Depending on the metric, differnet kind of metric values can be stored in the metric scores array

  • output_path (str) – path to output folder

  • run_id (int) – id of smac search iteration

lkauto.utils.get_default_configuration_space module

lkauto.utils.get_default_configuration_space.get_default_configuration_space(data: pandas.core.frame.DataFrame, val_fold_indices, feedback: str, validation: pandas.core.frame.DataFrame = None, random_state=42) → ConfigSpace.configuration_space.ConfigurationSpace

returns the default configuration space for all included rating predictions algorithms

Parameters
  • data (pd.DataFrame) – data to use

  • val_fold_indices – validation fold indices

  • validation (pd.DataFrame) – validation data (provided by user)

  • feedback (str) – feedback type, either ‘explicit’ or ‘implicit’

  • random_state (int) – random state to use

lkauto.utils.get_default_configurations module

lkauto.utils.get_default_configurations.get_default_configurations()

returns a list of default configurations for all algorithms in the configuration space

Parameters

config_space (ConfigurationSpace) – configuration space to use

Returns

Return type

List[Configuration]

lkauto.utils.get_model_from_cs module

lkauto.utils.get_model_from_cs.get_model_from_cs(cs: ConfigSpace.configuration_space.ConfigurationSpace, feedback: str, fallback_model=<lenskit.algorithms.bias.Bias object>, random_state: int = 42) → Union[lenskit.algorithms.Recommender, lenskit.algorithms.Predictor]

builds a Predictor model defined in ConfigurationSpace

Parameters
  • cs (ConfigurationSpace) – configuration space containing information to build a model

  • feedback (str) – feedback type, either ‘explicit’ or ‘implicit’

  • fallback_model (Predictor) – fallback algorithm to use in case of missing values

  • random_state (int) – random state to use

Returns

fallback_algo – Predictor build with the config_space information

Return type

Predictor

lkauto.utils.logging module

lkauto.utils.logging.get_logger(name: str = 'lenskit-auto', level: str = 20) → logging.Logger

Returns a logger with the given name and level.

Parameters
  • name (str) – name of the logger

  • level (int) – level of the logger

Returns

logger with the given name and level

Return type

logging.Logger

lkauto.utils.update_top_n_runs module

lkauto.utils.update_top_n_runs.update_top_n_runs(num_models: int, top_n_runs: pandas.core.frame.DataFrame, run_id: int, config_space: ConfigSpace.configuration_space.ConfigurationSpace, errors)

updates the top n runs dataframe with the new run

Parameters
  • num_models (int) – number of models to keep track of

  • top_n_runs (pd.DataFrame) – pandas dataframe containing the top n runs of the optimization method.

  • run_id (int) – run id of the new run

  • config_space (ConfigurationSpace) – configuration space of the new run

  • errors (np.ndarray) – errors of the new run

Returns

top_n_runs – pandas dataframe containing the top n runs of the optimization method.

Return type

pd.DataFrame

lkauto.utils.validation_split module

lkauto.utils.validation_split.validation_split(data: pandas.core.frame.DataFrame, strategie: str = 'user_based', num_folds: int = 1, frac: float = 0.25, random_state=42) → dict

Returns a dictionary with the indices of the train and validation split for the given data. The dictionary has the following structure: {

0: { # fold 0

“train”: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], “validation”: [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

}, 1: { # fold 1

“train”: [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], “validation”: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

}

}

Parameters
  • data (pd.DataFrame) – Pandas Dataframe with the data to be split.

  • strategie (str) – cross validation strategie (user_based or row_based)

  • num_folds (int) – number of folds for the validation split cross validation

  • frac (float) – fraction of the dataset to be used for the validation split. If num_folds > 1, the fraction value will be ignored.

  • random_state (int) – random state for the validation split

Returns

dictionary with the indices of the train and validation split for the given data.

Return type

dict

lkauto.utils.validation_split.row_based_validation_split(data: pandas.core.frame.DataFrame, num_folds: int = 1, frac: float = 0.25, random_state=42) → dict

Returns a dictionary with the indices of the train and validation split for the given data. The dictionary has the following structure: {

0: { # fold 0

“train”: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], “validation”: [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

}, 1: { # fold 1

“train”: [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], “validation”: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

}

}

Parameters
  • data (pd.DataFrame) – Pandas Dataframe with the data to be split.

  • num_folds (int) – number of folds for the validation split cross validation

  • frac (float) – fraction of the dataset to be used for the validation split. If num_folds > 1, the fraction value will be ignored.

  • random_state (int) – random state for the validation split

Returns

dictionary with the indices of the train and validation split for the given data.

Return type

dict

lkauto.utils.validation_split.user_based_validation_split(data: pandas.core.frame.DataFrame, num_folds: int = 1, frac: float = 0.25, random_state=42) → dict

Returns a dictionary with the indices of the train and validation split for the given data. The dictionary has the following structure: {

0: { # fold 0

“train”: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], “validation”: [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

}, 1: { # fold 1

“train”: [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], “validation”: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

}

}

Parameters
  • data (pd.DataFrame) – Pandas Dataframe with the data to be split.

  • num_folds (int) – number of folds for the validation split cross validation

  • frac (float) – fraction of the dataset to be used for the validation split. If num_folds > 1, the fraction value will be ignored.

  • random_state (int) – random state for the validation split

Returns

dictionary with the indices of the train and validation split for the given data.

Return type

dict

Module contents