Video Game Recommendation Engine – This is how we do it
These are data science panels and we started off this panel with a video game recommendation engine. I had Stephen fill out a survey prior to the panel and from his results I built a recommendation model, with the goal of selecting games he has not played (he’s played a lot of games, so not an easy task) and would rate above average.
How are we going to build this recommendation? Through Propensity scoring!
A propensity score is an estimated probability that a data point might have the predicted outcome.
- One of our panelists completed a survey and had to rank video games they have played
- Their responses were linked to our ancillary data (critics score, user score, and genres)
- Our model shot out a score between 0 and 1. The closer to 1 the more likely this game would be enjoyed by the panelist.
Video Game Recommendation Engine – The Output
For this panelist, the survey told us this about their gaming preferences:
The value User Score more than the Critics Score.
Their preferred genre is Action Adventure.
Their preferred platform is the PS2.
Video Game Debate: Overview
On the screen will be a video game, with some profiling data.
Panelist will debate the impact, perceived and replay value of the featured game.
Crowd will decide who made the better argument.
This is the meat of the panel., on the screen is also the IGN review headline and rating, Stephen and myself would take turns and argue if it deserved it’s ranking.
Stephen went first and argued that Goldeneye does not deserve this high of rating and his key point was on the replay value. I attempted to argue on to value it at time of release. The crowd sided with Stephen.
Pokémon Gold & Silver
I went first this round and argued for the rating, this was a very pro Pokémon crowd. Stephen brought up good points on where he thinks the series should go and adding another region is not the answer. The crowd sided with Me.
Ultimate Marvel vs. Capcom 3
Stephen chose to argue for this game, I wanted to throw a curve-ball in this debate. It would have been very obvious if we chose Marvel vs Capcom 2, too easy. I argued that it wasn’t even the best in the series, and the best in the series is actually X-men vs Street fighter.
Halo Combat Evolved
Stephen was on team Halo for this one, I love Halo as well, but the crowd did not. That was a shock to us but maybe Halo doesn’t have replay value? Or everyone is getting tired with the series.
Battle Dome: Overview
Two games go in… only one comes out
Panelists will argue for a game, they cannot both argue for the same game
The crowd decides who had the best argument
This was fun and challenging section of our panel. I won’t go into details on this section but I do want to try something out. As test to see who is interacting with my page by reading the data stories, I have a special giveaway.
Here are the rules, you must have an Instagram account. You must be following my Instagram account: @pancake_analytics.
To enter you need read through the battle dome section, screen shot your favorite match-up and post it to instagram.
In this post I want you tag @pancake_analytics and caption the post with “Who do you have in this Battle Dome match-up?”.
This giveaway will end on December 31st, 2019 and the winner will receive a Game-stop Gift card from me. For to use on your next video game purchase in the new year!
Here’s the disclaimer I have to post:
Per Instagram rules, we must mention this is in no way sponsored, administered, or associated with Instagram, Inc. By entering, entrants confirm they are 13+ years of age, release Instagram of responsibility, and agree to Instagram’s term of use. Good luck!!!!!
Here’s the battle dome match-ups:
I want to personally thank everyone who attended the panel in Tampa, at the Tampa Comic Convention. I look forward to meeting again in 2020.
This Panel was held on:
Friday, August 2, 2019 at 7:30 PM – 8:30 PM
During the Tampa Bay Comic Convention 2019, held at the Tampa Convention Center.
The Panelists were:
Tom Ferrara (@pancake_analytics) , Kalyn Hundley (@kehundley08), Andy Polak (@polak_andy)
I want to take a quick moment to discuss the panelists. I love giving as many different point of views as possible to these data science panels. Without this variety of point of views it’s more of a lecture and less of a discussion. This mix of panelists gave the audience the data science view, the tech industry view and the biological sciences view. Best part about this is Smash Brother brought us all together.
Changing the Tier Conversation
One of the main objectives of this panel was getting a discussion going on tier selection in Smash and how do we base tier selection in data science, and how do we validate our findings through one of the best players in the game.
A k-means cluster uncovers trends within our Smash Brothers data to understand the relational similarities and differences on key in game attributes.
The more clusters the clearer our picture becomes and the deeper we can understand the pros and cons of each main selection.
A brief overview of a k-means cluster:
- Standardize your variables
- Analyze your elbow curve
- Validate your clusters
Treat each game release as new product launch or a change in the market.
You would re-score your data, to understand the current market and you’re able to migrate and understand how the meta-game has changed.
We end up with five unique clusters:
This group is the slowest by run speed and lightest by weight.
Jack Of All Trades:
They are middle group on everything, there is no distinct trend.
Like the Jack of All Trades group but faster.
Fast in aerial attacks and the heaviest of the characters.
This group is the fastest and the lightest.
A propensity model is a statistical scorecard that is used to predict the behavior of your customer or prospect base. Propensity models are often used to identify those most likely to respond to an offer, or to focus retention activity on those most likely to churn.
So who should be your main? In this segment I rely on industry knowledge as well (ZeRo’s tiers as dependent variable). I’ll build propensity score with the following independent variables:
- Change in air acceleration
- Base air acceleration
- Base speed in the air
- Base Run Speed
- Character Weight
- Ultimate Smash Bros. Cluster
- Wii-U Smash Bros. Cluster
What makes these three stand above the crowd?
The are middle ground on weight, fast air accelerators.
What are the differences between the three?
Wario has a slow run speed.
Palutena is the lightest.
Yoshi is the middle ground of this group.
The Curious Case of Ganondorf
Ganondorf has more in-common with Jiggly Puff than he does Bowser.
The reason being is he’s quicker and can adapt well in aerial attacks and in falling than Bowser can.
On the flip-side of this I can also say Bowser more accurately represents how he’s viewed from the super Mario franchise, in Super Smash Bros. Ultimate.
Game Time: Name that segment: Overview
I personally feel one of the best ways to reinforce learning is through a game. For this panel I decided to reinforce the k-means segmentation and wanted volunteers to guess the segment 3 characters on the screen fall into.
Here was the overview:
On the screen will be 3 characters
All 3 characters belong to the same segment
Volunteers will do their best to convince the panel of which segment the characters fall into:
- Jack of All Trades
- Air Tanks
For participating volunteers receive a fabulous prize.
For this particular game the prize was an amiibo of their choice that works with Smash Ultimate for the Nintendo Switch.
I want to personally thank everyone who attended the panel in Tampa, at the Tampa Comic Convention. I look forward to meeting again in 2020.
“When you play the game of thrones, you win or you die.” — Cersei
Let’s bring this quote to life in what I like to call a survival tree of the fittest. This week’s analysis will focus on the character survival in Game of thrones. Chow down and enjoy!
Winter is coming and you’d like to know your chances of survival in the Game of Thrones universe.
Let’s learn from those who have survived to this point and those who have met their unkindly fate.
To do this I’ll build a classification tree with my event being set to is the character alive (1 for yes, 2 for zero). Classification trees in general test the null hypothesis, when we reach my tree visualization I’ll assign the color red to instances of were it’s highly probable of a character death. Green leaves will indicate it’s highly probable a character survives… as long as all this criteria is met.
Think of this tree as a really morbid family tree, but since the data is Game of thrones it fits right into place.
The variables have readily available to me (hopefully they have importance) are as follows:
- House Affiliation
- Member of nobility
- Marital Status
- Family history of deaths
From the initial read I see knowing if a character is popular among fans and if they are male hold the highest importance in determining survival.
Also the variables I have available account for 75% of the variability (a 25% miss-classification rate).
Let’s say you moved to Westeros, out of the gate you have a 25.4% chance of meeting your end. At those odds I’m taking my chances but I should stay under the radar as much as I can, because the data warrants it.
If you become a popular character or are an integral part of the story, your death becomes more meaningful and your probability of survival is worse than a coin flip.
So let’s say you’re a like-able character (you can’t help it), not all is loss, as long as you’re a female. The highest survival rate is the popular female character group. This is a classic tale of high risk high reward.
A classification tree is a great way to visual your data and now I’ll walk us through this Game of Thrones survival tree.
Let’s start at the very top, the tree assumes everyone has a 75% of survival. Now as the tree splits this Is where the interesting part begins, and our data story begins to unfold.
If you are a popular character you flow to the left side of the tree, your survival rate of 75% now drops to 48%.
Staying to the left side of the tree there is another important split, are you a male or female? Female characters have a higher probability of surviving (87% if you’re popular and 76% if you’re under the radar).
If you’re a male and you’re popular you have a 42% chance of survival (We’re looking at your Peter Dinklage).
Now here’s the largest caveat to take with this classification tree: I’m assuming it will no longer be relevant after the final season. Winter is coming and most likely our characters will see their end by hands of White Walkers.
What have we learned from diving into the Game of Thrones Data?
Everyone has starts off at a 75% survival rate and as your popularity grows your survival rate lessons by 27%. If you’re a male your survival drops again by 33%. If you’re a popular female character you are 45% more likely to survive versus your male counterparts.
An interesting tidbit…If you become popular and you are a female (hopefully the mother of dragons) you boast the highest survival rate of anyone in this universe, 87%.
After you have consumed this meal, I hope you take these findings and enjoy your episode of Game of Thrones. J Also as always enjoy the featured pancake recipe below!
It all started with a mouse. This mouse is turning 90 this year and Mickey Mouse has made his impact on society. To celebrate, what better meal to cook us this week than Walt Disney World Data? I’ll be challenging myself to
identify influencers on the Parks and Resorts Division’s yearly revenue.
With Mickey Mouse turning 90 years old this year, what better meal to cook us this week than Walt Disney World Data? I’ll be challenging myself to identify influencers on the Parks and Resorts Division’s yearly revenue.
My first approach was to identify what happens during the year the revenue occurs?
The number of Animated Movies released by Disney
The number of Animated Movies featuring Disney Princesses
The number of Attractions add at all four main theme parks and then parsing this information out by the individual park
The first run was not an effective model: most of the variability in the data was not accounted for, and there were no independent variables of significance.
So my next approach was how do I capture word of mouth on movies and attractions? Secondly, how do I incorporate when Disney starts charging admission to children (currently 2 yrs and younger, enter the parks for free)?
To knock out two birds with one stone, I settled on let me test a rolling 3-year average of all behaviors. The results were very favorable, 67% of the variability is explained and I have interesting independent variables of significance to make a telling data story
If you’re a subscriber to this blog and enjoy the Stacks of Stats, you’ll recognize my preference for Q graphs.
There’s some curls at the tails but most of the data fits well, so there won’t be a need to run a more complex model.
Let’s take a bite into the initial read before accessing the financial impact of all these fun Disney variables.
I’ll caveat this, significance is in the eye of the beholder, and is up to interpretation of the storyteller and data scientist. The first read shows the 3-year average of total park attractions having the highest relationship to revenue and inversely the amount of attractions opened at EPCOT has significance but a negative impact on yearly revenue.
I’ll dive more into the individual impacts later, but I want to utilize my upper and lower bounds.
The output of this model shows the impact in millions USD. Analyzing the cone, this is where our fairy tale begins to take shape.
Potentially the average amount of attractions introduced at the all four major parks can drive in $1.6 million USD.
With the Magic Kingdom driving most of this impact:
New attractions added at the Magic Kingdom can drive in $4.5 million USD.
The average amount of the Disney Princess movies does have more of an impact than factoring Disney releasing an animated movie as the only criteria. What’s intriguing is the variability of our upper and lower bounds, there is a possibility there could be a loss of $50.6M.
What could be driving the inverse affect? Multiple reasons:
1.The quality of the movie releases
2.The presence or in this case non-presence of a meet and greet at the theme park
3.The global economic climate (Less international travel impacts this!)
What have learned from diving into the Walt Disney Data?
There’s a reason WDW is investing in new IP based rides at Epcot and Hollywood Studios: they’ve been launching the rides outdated with their audience and they drive the lowest impact currently on yearly revenue. I anticipate Epcot to see a steady growth on impact when Guardians of the Galaxy and Ratatouille open and a few years have passed.
Finally a Princess Animated Movie drives in 1 million USD more than a regular animated move release.
What could be the reasoning? I’d guesstimate rides introduced at the Magic Kingdom (drives in +4.5M USD) is having a downstream affect on the Princess impact. Most Princess interactions take place at the Magic Kingdom.
After you have consumed this meal, I hope you take these findings and with Mickey Mouse a Happy 90th Birthday. J Also as always enjoy the featured pancake recipe below!