How crazy would it be if I told you Howard the Duck and Old Man Logan are closer to each other in skill sets than they are to any other Marvel characters? Or how about Thor and Dr. Octopus are lookalikes as well? Let’s answer these questions together by wrangling some readily available data.
If I’ve learned anything from my career in data science it’s this: 80% of the work is data gathering and etl work, and 20% is analysis.
Nothing holds truer to this statement than finding data of Marvel characters skills set, on a normalized scale. In this data story I’ll be using data from Marvel Contests of Champions (power index levels, health and attack) and the Marvel Battle Royale (a twitter fan poll of greatest superheroes).
A few more variables I’ll need to calculate around the results of the Marvel Battle Royale Twitter Fan Poll:
Total votes per each round
Average Total votes
A flag for if they were higher than average total votes per marvel character
This flag I’ll use as my dependent variable and my independent variables will be the Marvel Contest of Champions statistics.
What will this do? This will predict the likelihood a Marvel Character would receive higher than the average total votes in the Marvel Battle Royale.
Once this is calculated I’ll receive an output of coefficients which I can apply to the rest of the Marvel Characters whom weren’t in the Marvel Battle Royale to create a propensity score.
Now let’s back track a little bit and see why I’m going with a propensity model as opposed to a grouping by opinion. I.e. Let’s put all the top attackers in the same category.
The top 3 characters based on Attack are Rocket Raccoon, Spider-man (Symbiote), and Blade.
In the above histogram, if you look all the way to the far right you’ll notice they are the data points on their own little island.
Well what if I just grouped everyone by Health? This data visualization looks more promising but mostly likely there would overlap on the other attributes and you wouldn’t be able to implement this successfully.
The power index by definition could be suitable but from the top 3 selected on power index I can tell this rating wasn’t an index in the vein of what I would typically use an index for (time-series forecasting) and it looks to be similar to the Pokemon Go Combat Point System, the ability to use their full potential.
One use of a propensity score is to create similar groups, based on the likelihood of performing a behavior.
In this case Doctor Octopus and Thor (Ragnarok) statistically the same in the Marvel Contest of Champions skill set. For those of you want to go down and interesting rabbit whole, you can find YouTube videos on why Doctor Octopus should be in a demi-god tier.
This propensity score approach literally put Doctor Octopus in the same tier as a demi-god!
Medusa by power index alone would be close to Thanos but factoring all skill sets, she is statistically closer to Gwenpool, Cable, and Nightcrawler than she is to the Mad Titan.
Now for the crazy but statistically significant section. Howard the Duck (I’m hoping he gets a show on Disney+) and Old Man Logan are a propensity score match.
An example like this where many begin to argue in data science, when does subject material expertise come into play? We can argue significance forever, on any topic, but we can agree on all Marvel Champions have a value if played correctly.