lkauto.preprocessing package¶
Submodules¶
lkauto.preprocessing.preprocessing module¶
-
lkauto.preprocessing.preprocessing.preprocess_data(data: pandas.core.frame.DataFrame, user_col: str, item_col: str, rating_col: str = None, timestamp_col: str = None, include_timestamp: bool = True, drop_na_values: bool = True, drop_duplicates: bool = True, min_interactions_per_user: int = None, max_interactions_per_user: int = None) → pandas.core.frame.DataFrame¶ Preprocess data for LensKit This method can perform the following steps based on the user input: 1. rename columns to “user”, “item”, “rating”, “timestamp” 2. Drop all rows with NaN values 3. Drop all duplicate rows 4. Drop all users with less than min_interactions_per_user interactions 5. Drop all users with more than max_interactions_per_user interactions
- Parameters
data (pd.DataFrame) – Dataframe with columns “user”, “item”, “rating”
user_col (str) – Name of the user column
item_col (str) – Name of the item column
rating_col (str) – Name of the rating column
timestamp_col (str) – Name of the timestamp column
include_timestamp (bool = True) – If True, the timestamp column will be included in the dataset
drop_na_values (bool = True) – If True, all rows with NaN values will be dropped
drop_duplicates (bool = True) – If True, all duplicate rows will be dropped
min_interactions_per_user (int = None) – If not None, all users with less than this number of interactions will be dropped
max_interactions_per_user (int = None) – If not None, all users with more than this number of interactions will be dropped
- Returns
Dataframe with columns “user”, “item”, “rating”
- Return type
pd.DataFrame
lkauto.preprocessing.pruning module¶
-
lkauto.preprocessing.pruning.min_ratings_per_user(df: pandas.core.frame.DataFrame, num_ratings: int, count_duplicates: bool = False)¶ Prune users with less than num_ratings ratings
- Parameters
df (pd.DataFrame) – Dataframe with columns “user”, “item”, “rating”
num_ratings (int) – Minimum number of ratings per user
count_duplicates (bool = False) – If True, all ratings are counted, otherwise only unique ratings are counted
- Returns
Dataframe with columns “user”, “item”, “rating”
- Return type
pd.DataFrame
-
lkauto.preprocessing.pruning.max_ratings_per_user(df: pandas.core.frame.DataFrame, num_ratings: int, count_duplicates: bool = False)¶ Prune users with more than num_ratings ratings
- Parameters
df (pd.DataFrame) – Dataframe with columns “user”, “item”, “rating”
num_ratings (int) – Minimum number of ratings per user
count_duplicates (bool = False) – If True, all ratings are counted, otherwise only unique ratings are counted
- Returns
Dataframe with columns “user”, “item”, “rating”
- Return type
pd.DataFrame