Recommendation Systems
Build systems that recommend items to users.
Suggest items users will love.
What are Recommendation Systems?
Suggest products, movies, or content users might like.
**Examples**: Netflix shows, Amazon products, Spotify songs
Types of Recommendations
**1. Content-Based**: Based on item features "You liked Action movies, here are more Action movies"
**2. Collaborative Filtering**: Based on user behavior "Users like you also liked these movies"
**3. Hybrid**: Combine both approaches
Content-Based Filtering
```python from sklearn.metrics.pairwise import cosine_similarity import pandas as pd
Movie features: [action, comedy, drama, romance] movies = pd.DataFrame({ 'title': ['Movie A', 'Movie B', 'Movie C'], 'action': [1, 0, 1], 'comedy': [0, 1, 0], 'drama': [0, 1, 1] })
User watched Movie A user_liked = movies[movies['title'] == 'Movie A'].iloc[0, 1:]
Find similar movies similarities = cosine_similarity( [user_liked], movies.iloc[:, 1:] )[0]
Recommend top 2 (excluding Movie A itself) recommendations = movies.iloc[similarities.argsort()[-3:-1]] print(recommendations['title']) ```
Collaborative Filtering - User-Based
```python from sklearn.metrics.pairwise import cosine_similarity import numpy as np
User-item ratings matrix ratings = np.array([ [5, 3, 0, 1], # User 1 [4, 0, 0, 1], # User 2 [1, 1, 0, 5], # User 3 [1, 0, 0, 4], # User 4 ])
Find similar users to User 1 user_similarity = cosine_similarity(ratings)
Get User 1's similar users similar_users = user_similarity[0]
Predict ratings for unwatched items # (Weighted average of similar users' ratings) ```
Matrix Factorization
Advanced collaborative filtering:
```python from scipy.sparse.linalg import svds
Decompose ratings matrix U, sigma, Vt = svds(ratings, k=2)
Reconstruct with predictions predicted_ratings = np.dot(np.dot(U, np.diag(sigma)), Vt)
Recommend highest predicted ratings ```
Using Surprise Library
```python from surprise import SVD, Dataset, Reader from surprise.model_selection import cross_validate
Load data reader = Reader(rating_scale=(1, 5)) data = Dataset.load_from_df(df[['user_id', 'item_id', 'rating']], reader)
Train SVD model = SVD() cross_validate(model, data, measures=['RMSE', 'MAE'], cv=5)
Get prediction prediction = model.predict(user_id='Tom', item_id='Movie123') print(f"Predicted rating: {prediction.est}") ```
Deep Learning Approach
```python from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, Embedding, Flatten, Dense, Concatenate
User and item inputs user_input = Input(shape=(1,)) item_input = Input(shape=(1,))
Embeddings user_embedding = Embedding(n_users, 50)(user_input) item_embedding = Embedding(n_items, 50)(item_input)
user_vec = Flatten()(user_embedding) item_vec = Flatten()(item_embedding)
Combine concat = Concatenate()([user_vec, item_vec]) dense = Dense(128, activation='relu')(concat) output = Dense(1)(dense)
model = Model([user_input, item_input], output) model.compile(optimizer='adam', loss='mse') ```
Cold Start Problem
**New User**: No history to base recommendations on **Solution**: Ask preferences, use popular items
**New Item**: No ratings yet **Solution**: Use content features, show to diverse users
Evaluation
```python from sklearn.metrics import mean_absolute_error
Predict ratings predictions = model.predict(X_test)
Calculate error mae = mean_absolute_error(y_test, predictions) print(f"MAE: {mae}") ```
Remember
- Collaborative filtering often best - Handle cold start problem - A/B test recommendations - Balance accuracy with diversity