In this recipe I’d like you to chow down on a Smash Brother analytical approach to selecting your main character. The approach I’m going to introduce you puts an emphasis on what makes a character unique.
Before I start diving into the Smash Brothers data, let’s discuss the k-means clustering approach. A k-means helps paint a clear picture of our data, in this case specifically it will identify Smash Brothers Characters by their attributes to create picture for who your main should be. Our characters will be assigned into segments
(tiers… everyone loves to put tiers around Smash Characters but they’re based solely on opinion and player preference)
based on trends in our data, and how closely a character is to the a group.
Take the above picture, without applying this approach we are in the top left quadrant, we only have a faint idea of who should be our main. As we apply more segments and more trends in the data we’ll eventually end up in the bottom left quadrant. A clear picture of who our main should be.
Now I keep mentioning trends in our data. How do we find trends in data where attributes are on the surface completely skewed and non-normalized? Take for instance a characters weight as a whole number will be larger than a characters acceleration rate in the air (aerial attacks).
We can achieve these trends by standardizing our variables, setting all variables to have a mean of zero. In doing so this analysis focuses strictly on the trends in our data and we can have a pretty interesting discussion: i.e. Yoshi is more similar to Kirby, than he is to Pac-man.
In preparation for this data story I came across the following article, on Business Insider: “These are the 11 best ‘Super Smash Bros. Ultimate’ characters, according to the world’s number-one ranked player”
Here’s an excerpt from the article:
And here is ZeRo being named the best overall player:
This triggered a thought in my head and I haven’t done this on the Pancakes Analytics page yet, but typically you would bring a k-means cluster in production and re-score your segments on an agreed upon cadence. In this case I’ll treat the release of a new game as the cadence.
I’ll run a k-means clustering on the character attributes in Wii-U version and then a k-means clustering on the same character attributes but for the Switch version.
While going through this process I’ll only be including those characters who were in both games and where the data is clean: i.e. all characters have a weight and all characters have available acceleration data. Sorry Inkling, you’re not in this segmentation.
Above are both segmentation cadences and characters will be split into these segment tiers:
- Floaters (Far right circle)
- Jack of all Trades (Smack in the middle)
- Dashers (Faster than your Jack of all Trades segment but not fast enough to be elite in that attribute)
- Air Tanks (The bottom left circle)
- Speedsters (Top left circle)
These aren’t ranked by what tier is the best, but we can make some assumptions. The Jack of All Trades segment, most likely you won’t be winning matches often but you’ll be competitive.
Smash Brothers is a unique fighting game, so characters do have a weight to them. Being light weight does have it’s advantages, but the learning curve of playing as a Speedster might be too high risk high reward for you.
The Floaters, if you select someone with a weight advantage in this group, you’ll likely to win your match but you have to master the move set (your smash move).
Air Tanks, is a no brainer I think for any skill set. If you want to have a high likelihood of lasting till time runs out, be an Air Tank (this won’t guarantee a win, that really depends on your competition).
I’m hoping visual this stood out to you the reader: Ganondorf made a large leap from the Air Tanks to the Floaters. This doesn’t only speak to Ganondorf but it also tells you information about Bowser as well.
When I speak to this to clients and those wanting to learn about a particular data, this is how it translates:
Ganondorf has more in-common with Jiggly Puff than he does Bowser. The reason being is he’s quicker and can adapt well in aerial attacks and in falling than Bowser can.
On the flip-side of this I can also say Bowser more accurately represents how he’s viewed from the super Mario franchise, in Super Smash Bros. Ultimate.
Neither one of these characters were “nerfed”, only re-calibrated so there’s a distinct difference between the two.
What do you do with this information? If you’re main is a Floater, Ganondorf would be a good transitional character if you were looking to play as a character with more weight. Or say you always play as an Air Tank, because you have the assumption anyone who has Kirby as a main shouldn’t be playing Smash Bros. then Ganondorf is a good transitional main for you when you eventually given in and select Kirby, “by accident”.
Below are the segments a brief overview of those characters within each segments:
This segment has high variability and you can see this from the oblong shape of the circle. Ganondorf and Jiggly Puff are driving this shape, all though they are in the same segment and are more similar to each-other than are to other segments, they are the furthest apart within this segment.
Now hold up… wait a second. Didn’t I just try to prove a point of how similar they are? Yes, but in relation of whose more similar to Ganondorf: Jiggly Puff or Bowser. But if I posed the question who is more similar to Ganondorf: Jiggly Puff or Kirby… that answer is Kirby.
This group on average are the slowest by run speed and lightest by weight… they Float.
This segment is the medium of everything. There’s no uniquely distinct trend in their data. Now playing as Pikachu vs Mega Man would have so game-play differences but statistically speaking you are starting with same underlying stats.
If you’re new the series this a good group to start with… they’re a Jack of All Trades.
The Dasher segment is very similar to the Jack of All Trades segment, only slightly faster. Playing in this group you could potentially do more harm than good, if you’re selecting because you want to stay middle ground. You could… Dash yourself off the area.
Air Tanks are fast in the aerial attacks… and the heaviest? I’m anticipating this group will be re-calibrated by the next release. In other words… Bowser has no business being as effective as he is in the air as he weighs, normally these two variable don’t correlate. I guess all the time battling a plumber who can flip and jumps is finally paying off.
This is your high risk high reward group. Characters in this segment are the fastest and the lightest. I personally am awful playing as Sonic, he’s too fast for playing level but a seasoned player could probably mop the floor with Sonic.
So who should be your main? In this segment I rely on industry knowledge as well (ZeRo’s tiers as dependent variable). I’ll build propensity score with the following independent variables:
- Change in air acceleration
- Base air acceleration
- Base speed in the air
- Base Run Speed
- Character Weight
- Ultimate Smash Bros. Cluster
- Wii-U Smash Bros. Cluster
The output will give me the likelihood ZeRo would rank the character as a top tier character. The highest influencers on predictability were:
Change in air acceleration
The lowest influencers were:
Base air acceleration
Ultimate Smash Bros. Cluster (this highlights the bias towards the Wii-U stats, influencing ZeRo’s rankings)
Drum roll please….
You should have your main be one of the above three. This is the data solution to selecting your main.
Really looking forward to the comments section on this one 🙂
How crazy would it be if I told you Howard the Duck and Old Man Logan are closer to each other in skill sets than they are to any other Marvel characters? Or how about Thor and Dr. Octopus are lookalikes as well? Let’s answer these questions together by wrangling some readily available data.
If I’ve learned anything from my career in data science it’s this: 80% of the work is data gathering and etl work, and 20% is analysis.
Nothing holds truer to this statement than finding data of Marvel characters skills set, on a normalized scale. In this data story I’ll be using data from Marvel Contests of Champions (power index levels, health and attack) and the Marvel Battle Royale (a twitter fan poll of greatest superheroes).
A few more variables I’ll need to calculate around the results of the Marvel Battle Royale Twitter Fan Poll:
Total votes per each round
Average Total votes
A flag for if they were higher than average total votes per marvel character
This flag I’ll use as my dependent variable and my independent variables will be the Marvel Contest of Champions statistics.
What will this do? This will predict the likelihood a Marvel Character would receive higher than the average total votes in the Marvel Battle Royale.
Once this is calculated I’ll receive an output of coefficients which I can apply to the rest of the Marvel Characters whom weren’t in the Marvel Battle Royale to create a propensity score.
Now let’s back track a little bit and see why I’m going with a propensity model as opposed to a grouping by opinion. I.e. Let’s put all the top attackers in the same category.
The top 3 characters based on Attack are Rocket Raccoon, Spider-man (Symbiote), and Blade.
In the above histogram, if you look all the way to the far right you’ll notice they are the data points on their own little island.
Well what if I just grouped everyone by Health? This data visualization looks more promising but mostly likely there would overlap on the other attributes and you wouldn’t be able to implement this successfully.
The power index by definition could be suitable but from the top 3 selected on power index I can tell this rating wasn’t an index in the vein of what I would typically use an index for (time-series forecasting) and it looks to be similar to the Pokemon Go Combat Point System, the ability to use their full potential.
One use of a propensity score is to create similar groups, based on the likelihood of performing a behavior.
In this case Doctor Octopus and Thor (Ragnarok) statistically the same in the Marvel Contest of Champions skill set. For those of you want to go down and interesting rabbit whole, you can find YouTube videos on why Doctor Octopus should be in a demi-god tier.
This propensity score approach literally put Doctor Octopus in the same tier as a demi-god!
Medusa by power index alone would be close to Thanos but factoring all skill sets, she is statistically closer to Gwenpool, Cable, and Nightcrawler than she is to the Mad Titan.
Now for the crazy but statistically significant section. Howard the Duck (I’m hoping he gets a show on Disney+) and Old Man Logan are a propensity score match.
An example like this where many begin to argue in data science, when does subject material expertise come into play? We can argue significance forever, on any topic, but we can agree on all Marvel Champions have a value if played correctly.
Before I dive into this week’s data story, let me state why I love the Nintendo Switch. I personally feel there’s a need for video games to be a social event, and couch co-op is a must have feature. The Nintendo Switch offers several games which meet this need.
My family loves playing video games and most of all we love playing video games together.
Most of the Nintendo games I’ve grown up on and have played over the years, Mario Kart by far is one of my favorites. I’ll admit my wife shows me how it’s done.
What I do find interesting about the Nintendo Switch is the joy con controllers, there’s a learning curve (but a huge improvement on the Wii-mote) and most veteran gamers prefer an alternative.
One alternative is the wireless controller, very similar to the X-box controller format. I did pick up the Yoshi version for my wife and she loves it and personally feels it improves her game-play.
I’d thought it was time to put this notion to the test, what impact if any does a wireless controller has on game-play performance versus using a joy con.
Mario Kart seemed like the logical choice for this is experiment, it’s a multiplayer game, you can standardize your users (via ride type and modifications), and performance is measured in a continuous variable of points.
A total of 8 trails were ran under these conditions:
-50 cc length race
Half through the trial one gamer switched to the wired controller (Test group) while the other gamer stayed on the single joy con (Control group).
Results were documented, and the etl. process began, points scored each race would be used as the key performance indicator.
I next ran a linear regression (great for evaluating an A/B test), with my dependent variable being the points scored after the event (introducing the wired controller) the two independent variables: Treatment and Pre Points Scored.
In this model I wasn’t concerned with the r-squared value or the significance level of each variable. The sample data was not large enough, this was closed circuit small market test.
The model itself did show to be significant, which is a good indicator I can continue with the results. Evaluating my Q’s graph, I see the model fits well, the trend goes through all the data points.
In my summary fit I notice there is a positive relationship between treatment (group) and post points scores. At first glance this says you improve your Mario Kart game-play performance if you play with a wireless controller.
To complete this story I want to know my upper confidence level to be able to know by how many points and is this enough to move me up the rankings.
Using a wired controller has the potential to increase a gamers point performance by over six points each race.
The average points differential between race placement is 1.2 points. This 6-point increase is enough to move you roughly 4 places, depending on your historic placement.
What have we learned from diving into the Mario Kart Data?
The controller you play with matters, switching to a traditional wired controller can potentially improve your point score by 6.5 points,
which depending on your average race placement can move you up 4 places in the final standings.
Observing the CPU controlled racers, Shy Guy performed the best with an average final placement of 2.8. The heavy class overall was the weakest group but without Bowser, it could have been worse. Bowser’s average final placement was 4th.
After you have consumed this meal, I hope you take these findings and enjoy your next Mario Kart Grand Prix. Also as always enjoy the featured pancake recipe below!
“When you play the game of thrones, you win or you die.” — Cersei
Let’s bring this quote to life in what I like to call a survival tree of the fittest. This week’s analysis will focus on the character survival in Game of thrones. Chow down and enjoy!
Winter is coming and you’d like to know your chances of survival in the Game of Thrones universe.
Let’s learn from those who have survived to this point and those who have met their unkindly fate.
To do this I’ll build a classification tree with my event being set to is the character alive (1 for yes, 2 for zero). Classification trees in general test the null hypothesis, when we reach my tree visualization I’ll assign the color red to instances of were it’s highly probable of a character death. Green leaves will indicate it’s highly probable a character survives… as long as all this criteria is met.
Think of this tree as a really morbid family tree, but since the data is Game of thrones it fits right into place.
The variables have readily available to me (hopefully they have importance) are as follows:
- House Affiliation
- Member of nobility
- Marital Status
- Family history of deaths
From the initial read I see knowing if a character is popular among fans and if they are male hold the highest importance in determining survival.
Also the variables I have available account for 75% of the variability (a 25% miss-classification rate).
Let’s say you moved to Westeros, out of the gate you have a 25.4% chance of meeting your end. At those odds I’m taking my chances but I should stay under the radar as much as I can, because the data warrants it.
If you become a popular character or are an integral part of the story, your death becomes more meaningful and your probability of survival is worse than a coin flip.
So let’s say you’re a like-able character (you can’t help it), not all is loss, as long as you’re a female. The highest survival rate is the popular female character group. This is a classic tale of high risk high reward.
A classification tree is a great way to visual your data and now I’ll walk us through this Game of Thrones survival tree.
Let’s start at the very top, the tree assumes everyone has a 75% of survival. Now as the tree splits this Is where the interesting part begins, and our data story begins to unfold.
If you are a popular character you flow to the left side of the tree, your survival rate of 75% now drops to 48%.
Staying to the left side of the tree there is another important split, are you a male or female? Female characters have a higher probability of surviving (87% if you’re popular and 76% if you’re under the radar).
If you’re a male and you’re popular you have a 42% chance of survival (We’re looking at your Peter Dinklage).
Now here’s the largest caveat to take with this classification tree: I’m assuming it will no longer be relevant after the final season. Winter is coming and most likely our characters will see their end by hands of White Walkers.
What have we learned from diving into the Game of Thrones Data?
Everyone has starts off at a 75% survival rate and as your popularity grows your survival rate lessons by 27%. If you’re a male your survival drops again by 33%. If you’re a popular female character you are 45% more likely to survive versus your male counterparts.
An interesting tidbit…If you become popular and you are a female (hopefully the mother of dragons) you boast the highest survival rate of anyone in this universe, 87%.
After you have consumed this meal, I hope you take these findings and enjoy your episode of Game of Thrones. J Also as always enjoy the featured pancake recipe below!