Marvel Comics, Propensity Modeling, Regression Modeling

Recipe 013: Marvel Comics Propensity Score

FerraraTom

How crazy would it be if I told you Howard the Duck and Old Man Logan are closer to each other in skill sets than they are to any other Marvel characters?  Or how about Thor and Dr. Octopus are lookalikes as well?  Let’s answer these questions together by wrangling some readily available data.


 

008

 


 

001

If I’ve learned anything from my career in data science it’s this: 80% of the work is data gathering and etl work, and 20% is analysis.

Nothing holds truer to this statement than finding data of Marvel characters skills set, on a normalized scale.  In this data story I’ll be using data from Marvel Contests of Champions (power index levels, health and attack) and the Marvel Battle Royale (a twitter fan poll of greatest superheroes).

A few more variables I’ll need to calculate around the results of the Marvel Battle Royale Twitter Fan Poll:

Total votes per each round

Average Total votes

A flag for if they were higher than average total votes per marvel character

This flag I’ll use as my dependent variable and my independent variables will be the Marvel Contest of Champions statistics.

What will this do?  This will predict the likelihood a Marvel Character would receive higher than the average total votes in the Marvel Battle Royale.

Once this is calculated I’ll receive an output of coefficients which I can apply to the rest of the Marvel Characters whom weren’t in the Marvel Battle Royale to create a propensity score.


 

002

Now let’s back track a little bit and see why I’m going with a propensity model as opposed to a grouping by opinion.  I.e. Let’s put all the top attackers in the same category.

The top 3 characters based on Attack are Rocket Raccoon, Spider-man (Symbiote), and Blade.

In the above histogram, if you look all the way to the far right you’ll notice they are the data points on their own little island.


 

 

003

Well what if I just grouped everyone by Health?  This data visualization looks more promising but mostly likely there would overlap on the other attributes and you wouldn’t be able to implement this successfully.


 

004

The power index by definition could be suitable but from the top 3 selected on power index I can tell this rating wasn’t an index in the vein of what I would typically use an index for (time-series forecasting) and it looks to be similar to the Pokemon Go Combat Point System, the ability to use their full potential.


 

005

One use of a propensity score is to create similar groups, based on the likelihood of performing a behavior.

In this case Doctor Octopus and Thor (Ragnarok) statistically the same in the Marvel Contest of Champions skill set.  For those of you want to go down and interesting rabbit whole, you can find YouTube videos on why Doctor Octopus should be in a demi-god tier.

This propensity score approach literally put Doctor Octopus in the same tier as a demi-god!


 

006

Medusa by power index alone would be close to Thanos but factoring all skill sets, she is statistically closer to Gwenpool, Cable, and Nightcrawler than she is to the Mad Titan.


 

007

Now for the crazy but statistically significant section.  Howard the Duck (I’m hoping he gets a show on Disney+) and Old Man Logan are a propensity score match.

An example like this where many begin to argue in data science, when does subject material expertise come into play?  We can argue significance forever, on any topic, but we can agree on all Marvel Champions have a value if played correctly.


006

009


 

005

010


003_008

DC Comics, K-Means Clustering, Logistic Regression, Propensity Modeling

Recipe 011: DC Super Hero Throw Down: Propensity Modeling

FerraraTom

I want you to remember, Clark…In all the years to come… in your most private moments… I want you to remember my hand at your throat… I want you to remember the one man who beat you.

Chilling quote isn’t it?  That was said by Batman to Superman during the The Dark Knight Returns, a comic book miniseries written and drawn by Frank Miller.

One of the greatest debates in comic book lore and a fun discussion to have is pitting up two superheroes against each other… Who wins and why?  The below data story will introduce a data science approach to answering this debate.  To have fun with it… I’ve thrown characters from the video game Injustice 2 into a Superhero Thrown Down Tournament.


012_pic

 

 


010_pic

Before we dive into the tournament and the results of the throw down, I’d like to touch on the approach: Propensity modeling.

Propensity modeling has been around since 1983 and is a statistical approach to measuring uplift (think return on investment).  The goal is to measure the uplift of similar or matched groups.

The heart of this approach lies within two machine learning approaches (segmentation and probability.)

Why propensity modeling for this exercise?  I wanted to rank my superheroes for the bracket using statistics (i.e. Batman is not getting a number one seed.)

35 characters were segmented on strength, ability, defense and health.  For the propensity score I gathered ranking information from crowd sourced websites and surveys.  Using this I was able to give an intangible skill score.  The reasoning was I wanted the medium of comics to do the majority of the work for me.  Comics are stories and the narrative drives the inner core of a character.  The higher a character is on a fan sourced website I’m assuming they are written well and are timeless.

Next step was to take the mean of the intangible skill score and flag those characters above the average (this will be my dependent variable for my logistic regression to calculate a propensity score).

What was thrown into the propensity model?  The skill sets gathered from the Injustice game, the assumption here is a character of Superman’s skill set would be written much differently then say Catwoman.

011_pic


Now it’s time for our throw down.

001_pic

The top four characters by propensity score were:

Cyborg

Supergirl

Aquaman

Black Adam

To determine a winner in the throw-downs characters were put up against each other in 11 categories.


Round 1 Takeaways:

002_pic

Our number one seed Cyborg nearly lost to Atrocitus. The result was 6-2-5, that’s read as six wins, 2 ties and 5 losses.

There were no upsets in the first round of play.  A few characters did not win a single category in their match-ups:

Harley Quinn (vs. Captain Cold)

Green Arrow (vs. Batman)

Black Manta (vs. Black Canary)

These three characters were ill-equipped to take on their opponent, it is possible they would have advanced given a new opponent.

003_pic


Round 2 Takeaways:

004_pic

Cyborg (our number one seed) defeated Captain Cold by a larger difference (+3 winning categories) compared to the previous match-up against Atrocitus, but he scored one win less.

We begin to see upsets in Round 2:

Robin defeated Black Adam by 1 winning category.  Wonder Woman defeated Firestorm by 4 winning categories.  Batman defeated Supergirl by 3 wining categories.

On propensity scores these were upsets, but from comic book debate standpoint you could argue these, i.e. given enough time to prepare Batman could defeat Supergirl.

005_pic


Round 3 Takeaways:

006_pic

Cyborg falls to Superman, loss by 4 categories.  This was the biggest fight Superman was given in this tournament to date (in both previous rounds he had 9 winning categories).

The upsets keep coming in:

Robin sneaks in a win again by 1 winning category (over Brainiac). Wonder Woman defeats the top seed in her region of the bracket (Aquaman) by 4 winning categories.  Batman defeated Green Lantern by 3 winning categories.

007_pic


Final 4 Takeaways:

008_pic

Robin’s Cinderella story comes to an end at the hands of Superman (winning in 9 categories).  Robin did fair better than those previously who gave Superman 9 category wins… Robin won in 2 categories.

Batman was able to upset Wonder Woman, by 2 winning categories.  We’re set for a championship round, the original who wins… Batman Versus Superman!

batman-vs-superman-movie


Our winner is…

009_pic

Superman defeats Batman.  Superman did not win in a landslide.  Batman loss by two categories but he was able to win in 5 categories.  Previously the highest total win categories against Superman were 3 winning categories.


What did we learn from diving into the DC data?  Comic book writing and fan perception goes along way in determining who wins a thrown debate.  If we use propensity modeling we can have more even playing field and limit the amount of unfair battles.


005

SupermanPancakesW


003_008