DC Comics, K-Means Clustering, Logistic Regression, Propensity Modeling

Recipe 011: DC Super Hero Throw Down: Propensity Modeling

FerraraTom

I want you to remember, Clark…In all the years to come… in your most private moments… I want you to remember my hand at your throat… I want you to remember the one man who beat you.

Chilling quote isn’t it?  That was said by Batman to Superman during the The Dark Knight Returns, a comic book miniseries written and drawn by Frank Miller.

One of the greatest debates in comic book lore and a fun discussion to have is pitting up two superheroes against each other… Who wins and why?  The below data story will introduce a data science approach to answering this debate.  To have fun with it… I’ve thrown characters from the video game Injustice 2 into a Superhero Thrown Down Tournament.


012_pic

 

 


010_pic

Before we dive into the tournament and the results of the throw down, I’d like to touch on the approach: Propensity modeling.

Propensity modeling has been around since 1983 and is a statistical approach to measuring uplift (think return on investment).  The goal is to measure the uplift of similar or matched groups.

The heart of this approach lies within two machine learning approaches (segmentation and probability.)

Why propensity modeling for this exercise?  I wanted to rank my superheroes for the bracket using statistics (i.e. Batman is not getting a number one seed.)

35 characters were segmented on strength, ability, defense and health.  For the propensity score I gathered ranking information from crowd sourced websites and surveys.  Using this I was able to give an intangible skill score.  The reasoning was I wanted the medium of comics to do the majority of the work for me.  Comics are stories and the narrative drives the inner core of a character.  The higher a character is on a fan sourced website I’m assuming they are written well and are timeless.

Next step was to take the mean of the intangible skill score and flag those characters above the average (this will be my dependent variable for my logistic regression to calculate a propensity score).

What was thrown into the propensity model?  The skill sets gathered from the Injustice game, the assumption here is a character of Superman’s skill set would be written much differently then say Catwoman.

011_pic


Now it’s time for our throw down.

001_pic

The top four characters by propensity score were:

Cyborg

Supergirl

Aquaman

Black Adam

To determine a winner in the throw-downs characters were put up against each other in 11 categories.


Round 1 Takeaways:

002_pic

Our number one seed Cyborg nearly lost to Atrocitus. The result was 6-2-5, that’s read as six wins, 2 ties and 5 losses.

There were no upsets in the first round of play.  A few characters did not win a single category in their match-ups:

Harley Quinn (vs. Captain Cold)

Green Arrow (vs. Batman)

Black Manta (vs. Black Canary)

These three characters were ill-equipped to take on their opponent, it is possible they would have advanced given a new opponent.

003_pic


Round 2 Takeaways:

004_pic

Cyborg (our number one seed) defeated Captain Cold by a larger difference (+3 winning categories) compared to the previous match-up against Atrocitus, but he scored one win less.

We begin to see upsets in Round 2:

Robin defeated Black Adam by 1 winning category.  Wonder Woman defeated Firestorm by 4 winning categories.  Batman defeated Supergirl by 3 wining categories.

On propensity scores these were upsets, but from comic book debate standpoint you could argue these, i.e. given enough time to prepare Batman could defeat Supergirl.

005_pic


Round 3 Takeaways:

006_pic

Cyborg falls to Superman, loss by 4 categories.  This was the biggest fight Superman was given in this tournament to date (in both previous rounds he had 9 winning categories).

The upsets keep coming in:

Robin sneaks in a win again by 1 winning category (over Brainiac). Wonder Woman defeats the top seed in her region of the bracket (Aquaman) by 4 winning categories.  Batman defeated Green Lantern by 3 winning categories.

007_pic


Final 4 Takeaways:

008_pic

Robin’s Cinderella story comes to an end at the hands of Superman (winning in 9 categories).  Robin did fair better than those previously who gave Superman 9 category wins… Robin won in 2 categories.

Batman was able to upset Wonder Woman, by 2 winning categories.  We’re set for a championship round, the original who wins… Batman Versus Superman!

batman-vs-superman-movie


Our winner is…

009_pic

Superman defeats Batman.  Superman did not win in a landslide.  Batman loss by two categories but he was able to win in 5 categories.  Previously the highest total win categories against Superman were 3 winning categories.


What did we learn from diving into the DC data?  Comic book writing and fan perception goes along way in determining who wins a thrown debate.  If we use propensity modeling we can have more even playing field and limit the amount of unfair battles.


005

SupermanPancakesW


003_008

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s