lkauto.preprocessing package

Submodules

lkauto.preprocessing.preprocessing module

lkauto.preprocessing.preprocessing.preprocess_data(data: pandas.core.frame.DataFrame, user_col: str, item_col: str, rating_col: str = None, timestamp_col: str = None, include_timestamp: bool = True, drop_na_values: bool = True, drop_duplicates: bool = True, min_interactions_per_user: int = None, max_interactions_per_user: int = None) → pandas.core.frame.DataFrame

Preprocess data for LensKit This method can perform the following steps based on the user input: 1. rename columns to “user”, “item”, “rating”, “timestamp” 2. Drop all rows with NaN values 3. Drop all duplicate rows 4. Drop all users with less than min_interactions_per_user interactions 5. Drop all users with more than max_interactions_per_user interactions

Parameters
  • data (pd.DataFrame) – Dataframe with columns “user”, “item”, “rating”

  • user_col (str) – Name of the user column

  • item_col (str) – Name of the item column

  • rating_col (str) – Name of the rating column

  • timestamp_col (str) – Name of the timestamp column

  • include_timestamp (bool = True) – If True, the timestamp column will be included in the dataset

  • drop_na_values (bool = True) – If True, all rows with NaN values will be dropped

  • drop_duplicates (bool = True) – If True, all duplicate rows will be dropped

  • min_interactions_per_user (int = None) – If not None, all users with less than this number of interactions will be dropped

  • max_interactions_per_user (int = None) – If not None, all users with more than this number of interactions will be dropped

Returns

Dataframe with columns “user”, “item”, “rating”

Return type

pd.DataFrame

lkauto.preprocessing.pruning module

lkauto.preprocessing.pruning.min_ratings_per_user(df: pandas.core.frame.DataFrame, num_ratings: int, count_duplicates: bool = False)

Prune users with less than num_ratings ratings

Parameters
  • df (pd.DataFrame) – Dataframe with columns “user”, “item”, “rating”

  • num_ratings (int) – Minimum number of ratings per user

  • count_duplicates (bool = False) – If True, all ratings are counted, otherwise only unique ratings are counted

Returns

Dataframe with columns “user”, “item”, “rating”

Return type

pd.DataFrame

lkauto.preprocessing.pruning.max_ratings_per_user(df: pandas.core.frame.DataFrame, num_ratings: int, count_duplicates: bool = False)

Prune users with more than num_ratings ratings

Parameters
  • df (pd.DataFrame) – Dataframe with columns “user”, “item”, “rating”

  • num_ratings (int) – Minimum number of ratings per user

  • count_duplicates (bool = False) – If True, all ratings are counted, otherwise only unique ratings are counted

Returns

Dataframe with columns “user”, “item”, “rating”

Return type

pd.DataFrame

Module contents