Streaming platforms like Netflix seem to know what you should be watching before you do. To better understand how and why, film buff and The Royals’ data scientist Dr Paul Vella built his own recommendation engine.
Netflix devotes a staggering amount of time, money and computational power to keep me happy, content and watching. But why do they think they know me so well? Every time their algorithm makes a recommendation, there’s a risk I might not like it and will consider switching to Stan (psych!).
But according to a 2017 article published in Wired, more than 80 per cent of the TV shows people watch on Netflix are discovered through the platform’s recommendation system. And Netflix are definitely not alone in being a recommendation-obsessed content provider.
Formulas have been implemented across Spotify, Amazon, YouTube and other platforms to recommend anything and everything. You could say they’re as common as Game of Thrones spoiler alerts on social media.
So why do these companies think I would like the songs, books and films they recommend? How did they reach those conclusions about me?
To satisfy my curiosity, I decided to try my hand at building a film recommendation system and see for myself how content providers arrive at their conclusions. The point was not to build a proper model per se, but to understand the inner logic of these systems and their potential use.
There are many different techniques for building recommendation systems. And approaches involve NLP (natural language processing), vector factorisation, nearest neighbour clustering and similarity indices.
Stay with me.
Because if you take a step back from the ‘technique’ and think about the methodology (or purpose) of these approaches, all of them are trying to do one of two things:
- Recommend items that people who are similar to you like (called collaborative filtering)
- Recommend items that have similar attributes to others you like (called content-based filtering)
A third, hybrid filtering approach combines these two, then applies weighting to reach a recommendation. And the logic behind each can be set out in a relatively straightforward way:
- Aaron and Bob both like Jurassic Park (1993)
- Aaron also likes Ready Player One (2018)
- Bob hasn’t seen Ready Player One
- Recommend Ready Player One to Bob
- Aaron likes Jurassic Park (1993)
- Jurassic Park is an action movie and so is The Meg (2018)
- Bob hasn’t seen The Meg
- Recommend The Meg to Bob
- Aaron and Bob both like Jurassic Park (1993)
- Aaron also likes Ready Player One (2018)
- Aaron 14 and Bob is 36
- 30-somethings aren’t into Ready Player One (2018), they like The Commuter (2018)
- The Commuter and Jurassic Park are both action movies
- Recommend The Commuter to Bob
To give an example of how this works on a larger scale, let’s look at Spotify. Their algorithm is pretty complex, and takes in data about what you’ve listened to and how long for, what you’ve liked or added to playlists, and more granular elements of the songs themselves like genre, tempo and duration. It also pays attention to what others who have similar preferences to you have listened to or liked.
The model I built in a Google Sheet is based on a much simpler collection of information. It recommends films from a list and tracks just two variables: when I last watched a film, and how much I like the genres the film fits into.
The logic behind tracking the date I last watched a film is pretty simple:
- Films that I’ve watched most recently shouldn’t be highly recommended.
- Films that I haven’t watched should be highly recommended.
- The longer it has been since I last watched a film, the more highly it should be recommended.
The viewership score is therefore just a count of the number of days since I’ve last seen the film. This puts less importance on films I’ve seen recently and more on those I haven’t seen for a while.
To get a viewership score for films I haven’t seen, I simply take the maximum number of days from the films I have seen. This means films I haven’t seen in a long time and films I haven’t seen at all are equally weighted.
I also kept the logic behind the genre preference score simple:
- Films can be classified in many categories. Avatar (2009), for example, contains elements of science fiction, futuristic, fantasy and adventure films.
- Giving a film a rating (out of five stars) counts equally across all genres (attributes) of the film.
- The genre preference score is therefore the sum of ratings given to all films in that genre.
This simple calculation reveals I prefer sci-fi and action films over drama, which is true.
Getting to a Recommendation Score
Since both variables are integers and there’s no logically necessary reason to place more importance on one or the other, I simply add the scores together to arrive at a recommendation rating (the higher the score, the higher the recommendation):
Now you know the mechanics behind a relatively simple content recommendation system, let’s see how good it’s been at improving my movie nights.
I have around 1,016 films in my database. And I’ve given a rating to 712 of these. I’ve watched 165. Given I can watch one film a night – well, two, if the first one was terrible – it took roughly six months of data collection before the system was recommending films I’d actually consider watching. This is evidenced by how strongly it kept recommending Eat, Pray, Love (2010). Ugh.
If I arrange my film ratings by date from Jan 1, 2018 to Apr 1, 2019, a simple linear regression reveals a slight positive trend in my ratings (it is a five-point scale after all, so any positive trend has to be small). So, there’s some evidence the films I’m watching more recently are getting better ratings – and therefore my movie nights are more enjoyable.
So what did I learn about recommendation engines?
- I can trust my spreadsheet’s recommendation more than a friend’s opinion
Anyone can build a recommendation model, and it will probably improve your choices. The system I designed doesn’t include any Python code or API calls, just a few fancy spreadsheet formulas and some stats know-how.
An element of DIY is probably better, anyway, because I can classify films the way I like. For example, I can break down ‘sci-fi’ into 10 micro-classifications (futuristic, time travel, zombies, etc) I am interested in, giving more accurate recommendations than just using ‘sci-fi’ on its own.
The more you can describe the elements in a set of choices, the better the model can be at recommending things you might like. Harvard’s cognitive psychologist George Miller famously published research back in the 50s that showed we can only hold about seven items in our short-term memory (or in this case, make a choice from around seven films).
And how many elements of those can we compare? Because a recommendation model can make suggestions based on hundreds, thousands or millions of elements.
- You can uncover patterns in your decision making you didn’t even know you make
Since I was tracking the order I watch films and their genres, it was possible to build a database of which genres I would tend to watch next by finding patterns in my preferences.
For example, if I watch a crime film, there’s a moderate association (0.29) that the next film will be a fantasy film. And if I watch an action film, there is a negative association (-0.15) that the next film will be a superhero film. That’s probably because my wife will want to watch something else!
- My feelings still play a part, they’re just quantified
It may come as a surprise, but recommendation engines are entirely reliant on the way a person feels. All the data and analysis in my film recommendation engine comes from two variables: my ratings of the films (how much I liked them), and when I last watched the films (was interested enough to act).
Netflix does the same thing, just in a more complex way. Its recommendation algorithm considers what you’ve watched, when and how long for, the order you watch films or series, your ratings, and the ratings given by other members who are similar to you.
The more descriptive these algorithms get, the better their recommendations are – to the point of factoring in ‘hyper-specific micro genres‘ I’ve proved to be at least curious about. Even the artwork of their content is displayed based on what I’ve engaged with in the past.
- You can flip the system to make predictions
Probably the most interesting take-away from building a recommendation engine is the possibility of extracting the importance scores or average ratings to make a prediction of how much I might like movies that aren’t yet released.
There are 13 films in my database that fall into the space, action and adventure genres, and they have an average rating of 3.15 stars. Does this mean I’d give Star Wars: The Rise of Skywalker three stars when it comes out at the end of the year? Will I be disappointed?
I’d probably have to use something a lot more advanced to get a more accurate prediction, something that might work out the part-worth (choice-based conjoint analysis) or standardised beta coefficients (stepwise linear regression) of individual aspects of films (actors, directors, release year, genres, etc) which could be used in as inputs in a model of my film ratings.
I could then use this model on a list of films being released over the next year or so, to filter them down those I’m most likely to give five stars to, all without the need to rely on other people’s opinions.
But first, I’m off to watch Hot Fuzz (2007), because an algorithm told me I’d like it.
– Dr Paul Vella