Content Recommendation Systems
Further Improvements & Research
yash101
Published 4/6/2025
Updated 4/6/2025
Recap & Goals
In the previous page, we learned how we can vectorize content into embeddings. We learned how embeddings are a way to quantify details about content. We also learned how it is possible to use those embeddings to determine how similar content is, as well as find similar content to a query embedding vector.
My primary goal in this page is to expand on implementation details on how we can create a more powerful content recommendation system. This page will be a bit of a hodge-podge of thoughts that I have to improve upon the algorithm I showcased in the previous page.
Learning Rate - Constant or Not?
In the previous page and demo, I specified learning rates
When a new embedding spawns in the content recommendation system, it does not know much about it. The system needs to be able to shift its knowledge of the embedding faster. But how can we achieve that? We can view an embedding vector as a statistical representation of what we think the conent means.
Standard Error of the Mean
What if we use standard error of the mean (SEM) to control our learning rate? If the standard error is high, we can have a high learning rate and if the standard error is low, we can have a lower learning rate.
where,
is the number of experiments is the standard deviation, which we will replace with a hyperparameter is the standard deviation of all of our observations
When an embedding is initially onboarded, there are a total of
But wait, that’s undefined since we are dividing by zero! But looking at its limit,
Which makes sense. We have no clue what the correct value for vector is, thus we need to cover an infinite number of standard deviations to capture the entire range of potential values. We can fix this by adding a positive bias to the fraction:
Where
But how can we use this to control our learning rate? We know that as
Let’s see how this behaves!
alpha = 1.0, alpha_min = 0.2, epsilon = 2
Notice how we
Magnitude-Based Learning Rate
The magnitude of an embedding vector tends to indicate the strength of the embedding. A higher magnitude means the embedding fitted much stronger to its meaning. Whereas, an embedding with a low magnitude (close to zero) has a very weak fit to its meaning.
We can first start calculating the direction and magnitudes of the vectors:
We can use a logarithm to keep our system more sensitive to low magnitudes, and less sensitive as the magnitudes approach infinity.
This will fail if
And we want to maintain a minimum learning rate so we keep learning, even if our embedding has seen a ton of experiments.
Let’s see how this behaves:
alpha=1.0 alpha_min=0.2 gamma=2.0
SEM and Magnitude-Based Boosting
In both functions, I used
Using both Standard Error (SEM) and Magnitude
We can combine both approaches. By doing so, we essentially boost the learning rate if our embeddings are weak.
With this function, if number of experiments (
alpha = 1.0 beta = 0.5 alpha_min = 0.2 epsilon = 4.0 gamma = 4.0
Keeping Recommendations Interesting
Serving Different Content
Assuming the content enrolled in the system stays constant and consistent, if we query similarity for a user multiple times, we will get the same result. No one wants to see the exact same content and recommendations over and over again. Additionally, we want to create opportunities for the algorithm to recommend something different from the user’s typical choices, leading to high higher value learning!
Where,
is a noise vector- Each element
The goal of the random noise is to add enough perturbation to the algorithm to keep its recommendations different, yet relevant. This means, we need to be careful of how much noise we add.
Some Possible Issues
People’s Interests are not Simple
People’s interests are multimodal. When it comes to movies, sometimes I enjoy watching comedies. Other times, I like watching dramas. This algorithm currently has no flexibility to effectively carry this concern.
I’m not sure if this can be alleviated by using a larger embedding.
Biases
If using this technique fully unsupervised, it is possible for biases to develop since it is hard to curate human feedback without manual effort.
(Human) Feedback
Achieving human feedback is a hard problem. How do you know if their feedback is negative, positive or neutral? What if it’s something in the middle? A lot of studies into HCI (human-computer interactions) would be necesssary to optimize feedback methods to appropriately understand users.
As an example, perhaps a user may “like” or “dislike” content. That’s direct feedback that could be used. But you have extra variables such as the amount of time the user viewed the content or if they interacted with it.