Content Recommendation Systems

How can Computers Recommend Content?

yash101

Published 4/9/2025

Updated 4/3/2025

Jump to Page 📚

A Note from the Author

I kicked off this article as a set of notes while tinkering with components for a startup I’m trying to get off the ground. As I worked through the prototypes, I naturally gravitated towards the algorithms described here. They just evolved as the logical next step.

My original idea (still evident in the demos and examples) was to break down content into a few key numbers that capture its characteristics. I figured that if different pieces of content shared similar numbers, they’d likely be alike, and I could use that property to recommend content. It turns out that my initial concept is strikingly similar to how embeddings function in today’s state-of-the-art systems!

Since these algorithms emerged organically during my prototyping, there may exist a few inconsistencies in the nomenclature and variable names throughout the article.

AI-generated image of a kid asking for a toy

Recommendation Systems - What & Why?

Ever been swamped with a massive library of content and wondered how to dish out the right stuff to the right folks? Think about a streaming service with heaps of movies and TV shows, or a social media app trying to keep users glued to their screens. That’s where recommendation systems come into play.

These systems aim to match users with the content they’re likely to love, but it isn’t a walk in the park. Every person’s different so we need to find a way through which we can get to know them.

So this leads to recommendation systems having two important jobs:

Recommend content: serve up content that a user will resonate with
Learn and adapt: Use feedback to fine-tune and improve future recommendations

And let’s be honest, a little boost to maximize shareholder value doesn’t hurt either.

So, how can we build a recommendation system? Let’s dive in!

Vectorization and Categorization

AI generated mess of a ton of toys

Imagine that you are the owner of a toy store. Your store sells all sorts of toys - dinosaurs, puppets, toy trucks, puzzles and so much more! How can you recommend the right toy to the right child?

A simple approach is to think about the different characteristics of each and every toy and note them down, say, in a spreadsheet. And we can group each and every toy based off these different characteristics.

Now, say we thought of 8 different characteristics and put them in a list. We can rate each and every toy in our toy store by these characteristics numerically. Putting together all 8 values for a single toy, we have vectorized the toy. For example:

plush-toy
animal
mental
red
blue
green
gender-bias
educational

We can then take each and every toy in the box of toys and rate each toy by the characteristics above.

               plush-toy animal mental red blue green gender-bias educational
dinosaur       0.9       0.9    0.2    0.1 0.1  0.8   0.2         0.2
puppet         0.7       0.7    0.4    0.3 0.3  0.3   0.0         0.8
puzzle         0.0       0.1    0.9    0.3 0.3  0.3   0.0         0.8
truck          0.0       0.0    0.7    0.8 0.0  0.0   0.7         0.0

In the table above, each row represents a vector, and each vector describes an object.

Using Vectors for Content Recommendations

AI-generated choosing a toy

To find the best toy recommendations, we just need to compare vectors and look other vectors which look similar!

But, we aren’t trying to just compare toys. We are trying to make recommendations for a user, Bob. But no worries, we can still use a similar approach by creating a vector for Bob himself - a vector which would represent his ideal toy.

Hi, I’m Bob!
My favorite toy is Clifford, a big red dog plushie I have at home. I can’t sleep without it! Find me a new toy so he has a friend too!

We can create an initial vector for what we know ahout Bob and it would look like:

               plush-toy animal mental red blue green gender-bias educational
bob            0.9       0.9    0.0    0.9 0.0  0.0   0.0         0.0

Since all we know about Bob is that he likes his “big red dog plushie”, we can create an initial embedding to match that.

Embeddings

Embeddings are a way to represent data (such as words, sentences, images, literally anything) as vectors in continuous space. Embeddings attempt to capture the meaning and context behind data, make it easy to compare two different data.

Since embeddings are vectors, they have the following characteristics:

They represent a point in high-dimensional space
Two similar vectors are “nearby”, and we can meaure the distance between them
Vectors have a magnitude ( $∥ V ∥$ ) and a direction, $\hat{V} = \frac{\vec{V}}{∥ \vec{V} ∥}$

High quality embedding vectors have a few extra characteristics on top of being vectors:

Two embeddings with a similar direction (they point in the same direction) have a similar meaning
Two embeddings with very different directions have a very different meaning
Magnitude can represent intensity or confidence. However, most embedding-based systems tend to ignore it by normalizing vectors
Embedding vectors can be sparse or dense. This is data science speak of how data is represented in vectors. Most embeddings are somewhere between both 🙂
- Dense vectors are compact and almost all dimensions have non-zero values
- Sparse vectors are extremely large vectors in which most dimensions, except one or a few, are non-zero.

In the example we have used, the vectors we created above were, in fact, simple embeddings, with just 8 elements. These fixed-size vectors which captured enough information about each toy to allow us to compare two toys against each other numerically and find Bob his perfect toy. In typical machine learning applications, the vectors we use tend to be much bigger, easily somewhere between 128 and millions!

Nearest Neighbor (NN)

Similar embeddings convey a similar meaning, and we know that since embeddings are vectors, two similar embeddings will be located close to each other. Thus, we can use nearest neighbor algorithms to recommend content with a similar meaning. In the visualization below, 20 vectors have been plotted as points. The red point is our point of interest. The orange circle encloses the two nearest neighbors - the closest points to our point of interest.

How can we find the nearest neighbors?

Given our vector of interest, $\hat{V_{0}}$ and a list of vector, we can calculate the distance between $\hat{V_{0}}$ and $\hat{V_{i}}$ and sort for the vectors with the lowest euclidean distance.

euclidean-distance (\hat{V_{0}}, \hat{V_{1}}) = \sqrt{\sum_{i} {(V_{0, j} - V_{i, j})}^{2}}

nearest-neighbor (\hat{V_{0}}, \hat{V}) = {\arg \min}_{i} (euclidean-distance (\hat{V_{0}}, \hat{V_{i}}))

Where $j$ is the index of an element in the vectors.

Display Output

Cosine Similarity

Since two embedding vectors which point in a similar direction have similar meaning, how can we know if two vectors are pointing in the same direction? We can use cosine similarity which inherently normalizes both vectors under comparison and provides a score to test if both vectors point in the same direction.

Cosine Similarity: A measure of the whether two vectors point in the same directions, ranging from $[- 1, 1]$; $cosine-similarity (\vec{A}, \vec{B}) = - 1$ represents two vectors with opposite directions, or embeddings with the opposite meaning; $cosine-similarity (\vec{A}, \vec{B}) = 1$ represents two vectors with the same direction, or embeddings with the exact same meaning.

Display Output

The Mathematics

Our goal is to create a score that determines if two vectors are pointing in the same direction. How can we achieve that?

In linear algebra, we can take the dot product (also known as the inner product) of both vectors $\vec{A}$ and $\vec{B}$ which we want to compare. If both vectors point in the same direction, the dot product will be large. If the vectors point $90^{\circ}$ from each other (orthagonal), the dot product is 0. If both vectors point in opposite directions, the dot products will be extremely negative. How can we use this to our advantage?

Let’s investigate the three ways how we can calculate the dot product:

\vec{A} \cdot \vec{B} = A_{1} B_{1} \dots A_{N} B_{N}

\vec{A} \cdot \vec{B} = \sum_{i = 1}^{N} (a_{n} b_{n})

\vec{A} \cdot \vec{B} = ‖ A ‖ ‖ B ‖ \cos θ

Where

$\vec{A} \cdot \vec{B}$ is the dot product of vectors $\vec{A}$ and $\vec{B}$
$A_{n}$ and $B_{n}$ are tne $n$ th element in their respective vectors
$∥ A ∥$ and $∥ B ∥$ are the magnitudes of vectors $\vec{A}$ and $\vec{B}$ respectively
$θ$ is the angle difference between both vectors

$\cos θ$

Given all of this, how can we create a score which rates how similar of a direction both vectors point in? In the third formula to calculate the dot product, we have $\cos θ$ . Given two vectors which are $0^{\circ}$ , $90^{\circ}$ and $180^{\circ}$ apart, what is the output of $\cos θ$ ?

\cos 0^{\circ} = \cos 0 radians = 1

\cos 90^{\circ} = \cos \frac{π}{2} radians = 0

\cos 180^{\circ} = \cos π radians - 1

In lamen terms, when both vectors point in the same direction, $\cos θ = 1$ , and when they are opposite, $\cos θ = - 1$ . And for those who may be a bit confused, degrees ( $n^{\circ}$ ) and radians $n radians$ are different measures of angles just like how Farenheight and Celcius are different measures of temperature.

We call the $\cos θ$ term the cosine similarity of both vectors.

$cosine-similarity = \cos θ$

And to calculate it, we can isolate it:

cosine-similarity (\vec{A}, \vec{B}) = \cos θ = \frac{\vec{A} \cdot \vec{B}}{‖ \vec{A} ‖ | \vec{B} ‖}

Or numerically (harder to read, easier to implement in code):

cosine-similarity (\vec{A}, \vec{B}) = \cos θ = \frac{\sum_{i = 1}^{N} {\vec{A}}_{n} {\vec{B}}_{n}}{∥ \vec{A} ∥ ∥ \vec{B} ∥} = \frac{\sum_{i = 1}^{N} {\vec{A}}_{n} {\vec{B}}_{n}}{\sqrt{\sum_{i = 1}^{N} {\vec{A}}_{i}^{2} \sum_{i = 1}^{N} {\vec{B}}_{i}^{2}}}

Learning from Feedback

AI-generated image of robot on laptop with a thought on its mind

To make a powerful content recommendation system, the system needs to be able to take feedback to update its own knowledge. How can we achieve that? We can make observations on the user’s behavior and update our knowledge base using those observations.

Let’s start by giving an initial recommendation:

Hi Bob, would you like this blue monkey plushie?

               plush-toy animal mental red blue green gender-bias educational
blue-monkey    0.9       0.9    0.0    0.0 0.9  0.0   0.0         0.0

We can then get Bob’s feedback.

Bob: ew, that thing? It’s ugly! Am I 5? No thanks!

Now, using Bob’s feedback, we can update his embedding vector.

V_{bob} = V_{bob} + α F (bob, blue-monkey) \cdot V_{b l u e - m o n k e y}

Where

$V_{bob}$ is Bob’s current embedding vector
$V_{blue-monkey}$ is the blue monkey’s current embedding vector
$F (bob, blue-monkey)$ is our observation of Bob’s feedback. A negative number implies a poor fit and a positive number implies a good fit. In this case, since Bob did not like the blue monkey plushie, we can apply a feedback coefficient of $- 1$ .
$α$ is a learning rate - a hyperparameter which describes how fast we should update Bob’s embedding vector.

We can also update the toy’s embedding vector:

V_{blue-monkey} = V_{blue-monkey} + β F (bob, blue-monkey) \cdot V_{b o b}

Where

$V_{bob}$ and $V_{blue-monkey}$ are swapped compared to updating Bob’s embedding vector
$β$ is the learning rate on the toy’s embedding vector.

Let’s calculate, using $α = 0.1$ and $β = 0.1$ and $F (bob, blue-monkey) = - 1$

                plush-toy animal mental red   blue  green gender-bias educational
bob (before)    0.9       0.9    0.0    0.9   0.0   0.0   0.0         0.0
blue-monkey (b) 0.9       0.9    0.0    0.0   0.9   0.0   0.0         0.0

bob (after)     0.81      0.81   0.0    0.9   -0.09 0.0   0.0         0.0
blue-monkey (a) 0.81      0.81   0.0    -0.09 0.9   0.0   0.0         0.0

Now, we have updated Bob’s embedding vector with his feedback, along with the toy’s embedding vector using his feedback!

Vector / Embedding Initialization

AI-generated image of guy confused why his car isn't working

Now that we know how to use embeddings to recommend content, how do we actually compute our embeddings in the first place?

When it comes to text data, we can use pretrained embeddings such as Word2vec, GloVe or FastText
We can try frequency-based initialization (read my article on TF-IDF)
SVD (singular value decomposition) or PCA (principal component analysis)
SVD digests information into simpler parts to uncover hidden relationships
PCA find the directions of greatest variance and tosses out the rest, leaving only the most important data
Transformers (such as BERT)

In fact, we can even use an LLM such as ChatGPT or Gemini to generate initial embeddings - that’s how I generated default embeddings for Æsop’s fables for the demo at the end!

Embedding Initialization Doesn’t need to be Perfect

Remember how our system can take feedback and learn? This means that if our vectors aren’t perfect, they will gradually improve over time! Thus, why even overengineer our vector initialization?

We can do something much simpler:

We can initialize our embedding vectors (for the data AND for the user) with a simple approach:

Randomly initialize each component
Use the average vector in your entire dataset as the starting vector
Initialize each component as $0$ (this does have some issues though)

Next, we can use apply feedback process gradually improve the accuracy of each embedding vector

This isn’t perfect since the system recommendations will be of low quality until it has had enough time and feedback to improve.

Demo - Recommender System for Æsop’s fables

Further Learning Tasks

Read the code, understand how it works
Update $α$ and $β$ (ALPHA and BETA) variables and see how the learning changes. Try $α = 0.5$ and $β = 0.3$ , for example.
Add and remove feedback function calls
Identify how to user vector changes, as well as how a story’s vector changes.

At Scale

AI-generated recommendations factory

To keep the above demo simple, we used a linear search where we performed a cosine similarity between each and every one of Æsop’s fables and the user’s vector. This does not scale since it requires $O (n)$ time, where $n$ is the number of stories (or items) in the recommender system.

We can use vector databases, databases which are specifically tuned to be good at approximate nearest neighbor algorithm implementations. Vector databases such as Pinecone, and libraries such as FAISS efficiently store and retrieve vectors using approximate nearest neighbor methods. We will touch more into this later on.

Ethics and Bias

The goal of this article was to describe how a simple recommendation system based off embeddings works. The methods described in this article are fully unsupervised. They don’t rely on labeled data, but instead, try to cluster and find some meaning in it.

As machine learning predominantly controls more of our lives, we need to be mindful about ethics, and the harm it brings along with its benefits. It’s important to pay attention to how our models develop unintended biases, make unfair recommendations, or even learn from bad behavior (which the internet is full of).

Sources

TF-IDF
Wikipedia: Recommender system
Datastax: What is Cosine Similarity
Wikipedia: Cosine Similarity
Wikipedia: Graph Database
Wikipedia: Vector Database
ChatGPT - for reviewing this article and assisting with visuals
Notebook sources - on GitHub.

Further Improvements & Research

Jump to Page 📚

A Note from the Author

Recommendation Systems - What & Why?

Vectorization and Categorization

Using Vectors for Content Recommendations

Embeddings

Nearest Neighbor (NN)

Cosine Similarity

The Mathematics

cos⁡θ

Learning from Feedback

Vector / Embedding Initialization

Embedding Initialization Doesn’t need to be Perfect

Demo - Recommender System for Æsop’s fables

Further Learning Tasks

At Scale

Ethics and Bias

Sources

$\cos θ$