Building a Recommendation Engine in Python With Collaborative Filtering (2024)

A step by step tutorial

In this tutorial, we will be discussing how to build a collaborative filtering machine learning recommendation algorithm in Python. Recommendation algorithms are used to predict what a user might like based on their past interactions or behavior. They are commonly used in a variety of applications, such as online retail stores, social media platforms, and streaming services.

There are several different types of recommendation algorithms, including collaborative filtering, content-based filtering, and hybrid approaches that combine both methods. In this tutorial, we will be focusing on collaborative filtering, which involves predicting a user’s preferences based on the preferences of similar users.

Building a Recommendation Engine in Python With Collaborative Filtering (2)

Collaborative filtering is commonly used in a variety of applications, such as online retail stores, social media platforms, and streaming services. Collaborative filtering algorithms are designed to identify patterns in user behavior and use these patterns to make personalized recommendations to individual users. By providing personalized recommendations, companies are able to improve the user experience and drive engagement, which can lead to increased revenue and customer loyalty.

Before we can get started with coding our recommendation algorithm, we will need to install the necessary libraries. The most important library we will need is scikit-learn, which provides tools for data preprocessing, model training, and evaluation. We will also need pandas for loading and manipulating the data, and numpy for numerical computations.

To install these libraries, open up a terminal and run the following commands:

pip install scikit-learn
pip install pandas
pip install numpy

Next, we will need to load and preprocess the data that we will be using to train our recommendation model. This typically involves collecting user interactions with items (e.g. purchases, clicks, likes), and creating a matrix of users by items to represent these interactions.

For example, let’s say we want to build a recommendation algorithm for an online retail store. We might gather data on what products each user has purchased, and create a matrix where the rows represent users and the columns represent products. The cells in the matrix would contain a 1 if the user has purchased the product, and a 0 if they have not.

Before we can begin training our recommendation model, we will need to perform some preprocessing on this data. This may include removing missing values, normalizing the data, and possibly filtering out users or items with very little interaction data.

To load and preprocess the data, we can use the following code:

import pandas as pd
import numpy as np

# Load the data into a pandas DataFrame
df = pd.read_csv("data.csv")

# Replace missing values with 0
df.fillna(0, inplace=True)

# Normalize the data
df = (df - df.mean()) / df.std()

# Filter out users and items with very little interaction data
min_interactions = 3
df = df[df.sum(axis=1) >= min_interactions]
df = df[df.sum(axis=0) >= min_interactions]

# Create the user-item matrix
matrix = df.value

Now that we have our data preprocessed and ready to go, we can begin training our recommendation model. One popular method for training collaborative filtering models is matrix factorization, which involves decomposing the user-item matrix into two lower-dimensional matrices: a user matrix and an item matrix. These matrices represent the latent features of the users and items, respectively.

To train the model using matrix factorization, we can use the NMF (Non-Negative Matrix Factorization) class from scikit-learn. This class provides an implementation of the popular alternating least squares (ALS) algorithm, which is an iterative optimization method that can be used to minimize the difference between the predicted ratings (computed by multiplying the latent feature matrices) and the actual ratings in the user-item matrix.

Here is some example code for training a recommendation model using NMF:

from sklearn.decomposition import NMF

# Initialize the NMF model
model = NMF(n_components=num_latent_features, solver='als', random_state=0)

# Fit the model to the data
model.fit(matrix)

# Get the user and item matrices
user_matrix = model.transform(matrix)
item_matrix = model.components_

Note that num_latent_features is a hyperparameter that represents the number of latent features we want to use in our model. This is typically a small number (e.g. 10-20), and it will depend on the size and complexity of the data.

The NMF class has several other hyperparameters that can be adjusted to improve the model's performance. For example, the alpha parameter controls the level of sparsity in the model, and the l1_ratio parameter determines the balance between the L1 and L2 regularization terms.

It is recommended to experiment with different values for these hyperparameters and evaluate the model’s performance on a validation set to determine the optimal settings for your data.

Now that we have trained our recommendation model, we can use it to make recommendations for users. To do this, we can simply compute the dot product of the user’s latent feature vector and the latent feature vectors of the items in our catalog. This will give us a predicted rating for each item, which we can use to rank the items and recommend the highest-rated items to the user.

from sklearn.metrics.pairwise import cosine_similarity

# Get the index of the user we want to make recommendations for
user_idx = np.where(df.index == user_id)[0][0]

# Get the user's latent feature vector
user_vector = user_matrix[user_idx]

# Compute the dot product of the user vector and the item vectors
scores = np.dot(item_matrix, user_vector)

# Sort the items by score and get the top N recommendations
top_items = np.argsort(scores)[::-1][:N]

# Get the names of the recommended items
recommended_items = [df.columns[i] for i in top_items]

print("Recommendations for user {}: {}".format(user_id, recommended_items))

Finally, it is important to evaluate the performance of our recommendation model to ensure that it is making accurate and useful recommendations. There are several different metrics that can be used to evaluate recommendation algorithms, such as precision, recall, and mean squared error.

To evaluate our model, we can split our data into a training set and a test set, and use the training set to fit the model. We can then use the test set to evaluate the model’s performance on unseen data.

Here is an example of how to evaluate a recommendation model using scikit-learn:

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Split the data into a training set and a test set
X_train, X_test = train_test_split(matrix, test_size=0.2)

# Fit the model on the training data
model.fit(X_train)

# Make predictions on the test data
predictions = model.predict(X_test)

# Calculate the mean squared error
mse = mean_squared_error(X_test, predictions)

print("Mean Squared Error: {}".format(mse))

In this tutorial, we covered step by step instructions for how to train a model and code a recommendation algorithm using collaborative filtering in Python. We discussed the importance of gathering and preprocessing data, training a recommendation model using matrix factorization, making recommendations to users, and evaluating the model’s performance. While there are many other approaches and techniques that can be used in recommendation algorithms, this tutorial should provide a good foundation for those looking to get started with building their own recommendation systems.

Building a Recommendation Engine in Python With Collaborative Filtering (2024)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Francesca Jacobs Ret

Last Updated:

Views: 6012

Rating: 4.8 / 5 (48 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Francesca Jacobs Ret

Birthday: 1996-12-09

Address: Apt. 141 1406 Mitch Summit, New Teganshire, UT 82655-0699

Phone: +2296092334654

Job: Technology Architect

Hobby: Snowboarding, Scouting, Foreign language learning, Dowsing, Baton twirling, Sculpting, Cabaret

Introduction: My name is Francesca Jacobs Ret, I am a innocent, super, beautiful, charming, lucky, gentle, clever person who loves writing and wants to share my knowledge and understanding with you.