Practical implementation of Collaborative filtering — Recommender system

Published in

Geek Culture

5 min readJun 27, 2021

This type of Recommendation system is being used at many places like Netflix, Youtube, Amazon, Google, Spotify, Google News, Twitter, Facebook, Linked In and many more.

What is a recommender system which is seen everywhere?

How many types of recommender systems are there?

What are the differences between them?

These answers can be found in my previous article on recommender system (Just click here).

There are two approaches in collaborative filtering and can be implemented in any of the two ways using memory-based or model-based. The two approaches are:

User-based collaborative filtering: In this kind of approach, the recommendations are made based on similar users interests and tastes.
Item-based collaborative filtering: In this kind of approach, the recommendations are made based on similarity among items.

Representation of collaborative filtering

The main point behind Collaborative filtering is that it finds similar users and recommends the input user or main user or one user the same kind of things to look at or take that item or product or service for use. In the above example, when one baby likes three products balloons, music(piano, drums, any other instrument), birds (entertaining toys or real ones) then when other baby likes two of the products the first baby likes its obvious that the third product which the first baby likes is recommendable.

“Collaborative filtering recommends the products based on the similar users interests, preferences or choice which would be tailored to individual affinities and shopping patterns of the user. “

There are many advantages to this type of filtering. It is being used by many industries and companies to recommend their products and services which would help in turning up the business with good user experience and security.

Example 1. Memory-based implementation user-user collaborative filtering.

Here I’m going to use the same movies and rating dataset for making predictions

After collecting the input movie’s with their ratings given by the input user then adding the correct movie id from the movies dataset to the input movies.
Now finding out the users who have watched and rated the same movies as in the input movies.
Creating a subgroup data frame where the users similar to input users are indexed with the user id.
Let's next find such user’s group who have more movies in common to the input user and in the sorted format which gives good recommendations to the input user.
Here I’m using Pearson correlation to find the similarity matrix of the users
The Pearson correlation coefficient for different users versus the input user is found.
All the correlations of different users with input user are stored in a data frame with respect to the user-id
Ratings of all the movies of selected or top users from obtained top similarity users with input users are taken.
The weighted rating is known by multiplying the similarity index with ratings of the movies by selected or top users.
Then the two columns similarity index and weighted rating features are summed up with index movie id of all the users i,..e, the weighted sum of two features w.r.t. their movie id is obtained from the selected users.
Finally, the weighted average score is obtained by dividing the top users rating(sum) with the top users similarity index(sum). These values are sorted to find out the top recommendable movies to the input user.
All the other details of the selected movies can be extracted from all movies dataset to give the input user a rich experience with good details for making a choice to watch. This final data frame can be found below the code.

#2.Finding the users
ratings_df[ratings_df[‘movieId’].isin(inputMovies[‘movieId’].tolist())]# 3.Using gropuby for indexing the data with similar user id's
userSubsetGroup = userSubset.groupby(['userId'])#example to find one such user
userSubsetGroup.get_group(37)#4.Sorting it so users with movie more in common with the input user will have priority
userSubsetGroup = sorted(userSubsetGroup,  key=lambda x: len(x[1]), reverse=True)#6.finding pearson correlation between users and input user
#Store the Pearson Correlation in a dictionary, where the key is the user Id and the value is the coefficient
pearsonCorrelationDict = {}
#For every user group in our subset
for name, group in userSubsetGroup:
    #Let's start by sorting the input and current user group so the values aren't mixed up later on
    group = group.sort_values(by='movieId')
    inputMovies = inputMovies.sort_values(by='movieId')
    #Get the N for the formula
    nRatings = len(group)
    #Get the review scores for the movies that they both have in common
    temp_df = inputMovies[inputMovies['movieId'].isin(group['movieId'].tolist())]
    #And then store them in a temporary buffer variable in a list format to facilitate future calculations
    tempRatingList = temp_df['rating'].tolist()
    #Let's also put the current user group reviews in a list format
    tempGroupList = group['rating'].tolist()
    #Now let's calculate the pearson correlation between two users, so called, x and y
    Sxx = sum([i**2 for i in tempRatingList]) - pow(sum(tempRatingList),2)/float(nRatings)
    Syy = sum([i**2 for i in tempGroupList]) - pow(sum(tempGroupList),2)/float(nRatings)
    Sxy = sum( i*j for i, j in zip(tempRatingList, tempGroupList)) - sum(tempRatingList)*sum(tempGroupList)/float(nRatings)
    
    #If the denominator is different than zero, then divide, else, 0 correlation.
    if Sxx != 0 and Syy != 0:
        pearsonCorrelationDict[name] = Sxy/sqrt(Sxx*Syy)
    else:
        pearsonCorrelationDict[name] = 0# To check the output of correlation
pearsonCorrelationDict.items()# 8.selected users ratings of all movies
topUsersRating=topUsers.merge(ratings_df, left_on='userId', right_on='userId', how='inner')
topUsersRating.head()#10.Applies a sum to the topUsers after grouping it up by userId
tempTopUsersRating = topUsersRating.groupby('movieId').sum()[['similarityIndex','weightedRating']]
tempTopUsersRating.columns = ['sum_similarityIndex','sum_weightedRating']
tempTopUsersRating.head()#11.Now we have the weighted average
recommendation_df['weighted average recommendation score'] = tempTopUsersRating['sum_weightedRating']/tempTopUsersRating['sum_similarityIndex']
recommendation_df['movieId'] = tempTopUsersRating.index
recommendation_df.head()#12.Final dataframe with recommended movie deatils from the selected users 
movies_df.loc[movies_df['movieId'].isin(recommendation_df.head(10)['movieId'].tolist())]

Final recommended movies from user-based filtering by memory-based approach

Example 2. Model-based implementation item-based collaborative filtering.

This type of filtering or approach uses items and Machine learning models as a base to obtain and give recommendations to the user.

Here I used the KNN model for making recommendations when one user has watched a certain kind of movie with certain genres.

The recommendations for the above looks like below:

The final recommendations made for users who watched movies with certain genres

For more detailed code and predictions based on the movies go to this Jupyter notebook in the Github repo. Click here to check this.

Every struggle in your life has shaped you into the person you are today. Thankful for today and everready for tomorrrow. The only impossible journey is one you didn’t begin. So just do it and make it.

If you have learnt something just let me know by showing some support and share it with someone useful. If you have any queries or anything do let me know in the comment box. Bring some light to the world. Have a nice day. 🥰🕊🤍

Practical implementation of Collaborative filtering — Recommender system

Written by Yamini