Marvel Comics, Propensity Modeling, Regression Modeling

TBCC 2019 Avengers, Algorithms, and Analytics: Panel Recap



This Panel was held on:

Friday, August 2, 2019 at 9 PM – 10 PM

During the Tampa Bay Comic Convention 2019, held at the Tampa Convention Center.

The Panelists were:

Tom Ferrara (@pancake_analytics) , Kalyn Hundley (@kehundley08), Andy Polak (@polak_andy)



I want to take a quick moment to discuss the panelists.  I love giving as many different point of views as possible to these data science panels.  Without this variety of point of views it’s more of a lecture and less of a discussion.  This mix of panelists gave the audience the data science view, the tech industry view and the biological sciences view.  Best part about this is the avengers brought us all together.


When I pitched this panel the idea was what happens when a data scientist gets hold of the infinity gauntlet?  Pictured above is a visual representation of how I’m going to use each stone.

Use the Time Stone to predict the box office sales for the MCU and determine the top influencers for success.

Use the Power Stone to eliminate low hanging fruit.

Use the Soul Stone to uncover the underlying attributes of the marvel universe.

Use the Space Stone to transport the marvel universe to their closest match.

Use the Reality Stone to show you the marvel universe in a new light, perfectly balanced.

Use the Mind Stone to convince you this matching worked.

Time and Power Stones: What is influencing the MCU box office success?


I waked through those in attendance the output of regression model I built to unlock the the key influences of the Marvel Cinematic Universe and their relation to box office sales.

Considered influencers:

  • Rotten Tomatoes Scores (Critic and Audience)
  • Movie Release
  • Time since last MCU release
  • Solo Movie Releases
  • Was Iron Man in the movie?

Two Key Influencers stand out:

Having Iron Man in an MCU Movie drives in $100.5MM

The further along in the series drives in at least $216.8MM.  Story Development matters here’s the statistical proof!

Soul and Space Stones: Refitting the Marvel Power Scale


During this panel I walked the crowd through the output of a second machine learning algorithm, a propensity score.

Ingredients in the batter:

  • Marvel Contests of Champions (MCC) Power Index Levels
  • MCC Health
  • MCC Attack
  • Marvel Battle Royale (MBR) Twitter Poll:
  • TTL Votes per round, Avg TTL Votes

Flipping the pancakes:

Predict the likelihood twitter would vote for a character

Re-purposing this score to apply it to characters not in the MBR Twitter Poll

Reality and Mind Stones: Perfectly Balancing the Marvel Universe


This approach goes beyond ranking by attack, or defense.  This approach takes all those attributes together as well as the fan opinion.

If you only look at attack… you get skewed results

If you only look at defense… you get skewed results

A little bit of good… a little bit of crazy…

Old Man Howard the Duck?

Doctor Octopus the Demi-God?

Marvel Rapid Fire: Marvel Analytics Comparisons


This was one of my all time favorite segments out of all the comic cons I’ve had the pleasure of paneling at.  Quickly I would show the audience an analytics technique and show them the Marvel equivalent.  I think this technique is very effective in reinforcing our learning and opening up data science to a new audience.

Everything we just went through were machine learning techniques

Machine Learning is the Taskmaster of Data Science

Learns from past data, trains, and attempts to apply this training to new data

When something new is introduced it takes time to catch up

A/B Testing and Incremental ROI is the plot of Civil War


A neural network is Ultron… learns from observational data & figures its own solution


Dr. Strange ran a logistic regression to find out the odds-on Titan


Into the Spider verse was the perfect implementation of a random forest


Game Time: Marvel Team-Up: Overview


One of the best ways to reinforce learning is through a game.  During this panel I wanted to reinforce the learning from the propensity score.

I asked for 5 volunteers.  On the screen were 3 marvel characters.  2 characters on screen were look-a-likes (statistically speaking).  Volunteers did their best to convince the panel of which two characters should “Team-Up” or in other words identify the 2 statistically closest characters.

For participating all volunteers received a hero-clix figure of their choice.

I want to personally thank everyone who attended the panel in Tampa, at the Tampa Comic Convention.  I look forward to meeting again in 2020.


Marvel Comics, Propensity Modeling, Regression Modeling

Recipe 013: Marvel Comics Propensity Score


How crazy would it be if I told you Howard the Duck and Old Man Logan are closer to each other in skill sets than they are to any other Marvel characters?  Or how about Thor and Dr. Octopus are lookalikes as well?  Let’s answer these questions together by wrangling some readily available data.






If I’ve learned anything from my career in data science it’s this: 80% of the work is data gathering and etl work, and 20% is analysis.

Nothing holds truer to this statement than finding data of Marvel characters skills set, on a normalized scale.  In this data story I’ll be using data from Marvel Contests of Champions (power index levels, health and attack) and the Marvel Battle Royale (a twitter fan poll of greatest superheroes).

A few more variables I’ll need to calculate around the results of the Marvel Battle Royale Twitter Fan Poll:

Total votes per each round

Average Total votes

A flag for if they were higher than average total votes per marvel character

This flag I’ll use as my dependent variable and my independent variables will be the Marvel Contest of Champions statistics.

What will this do?  This will predict the likelihood a Marvel Character would receive higher than the average total votes in the Marvel Battle Royale.

Once this is calculated I’ll receive an output of coefficients which I can apply to the rest of the Marvel Characters whom weren’t in the Marvel Battle Royale to create a propensity score.



Now let’s back track a little bit and see why I’m going with a propensity model as opposed to a grouping by opinion.  I.e. Let’s put all the top attackers in the same category.

The top 3 characters based on Attack are Rocket Raccoon, Spider-man (Symbiote), and Blade.

In the above histogram, if you look all the way to the far right you’ll notice they are the data points on their own little island.




Well what if I just grouped everyone by Health?  This data visualization looks more promising but mostly likely there would overlap on the other attributes and you wouldn’t be able to implement this successfully.



The power index by definition could be suitable but from the top 3 selected on power index I can tell this rating wasn’t an index in the vein of what I would typically use an index for (time-series forecasting) and it looks to be similar to the Pokemon Go Combat Point System, the ability to use their full potential.



One use of a propensity score is to create similar groups, based on the likelihood of performing a behavior.

In this case Doctor Octopus and Thor (Ragnarok) statistically the same in the Marvel Contest of Champions skill set.  For those of you want to go down and interesting rabbit whole, you can find YouTube videos on why Doctor Octopus should be in a demi-god tier.

This propensity score approach literally put Doctor Octopus in the same tier as a demi-god!



Medusa by power index alone would be close to Thanos but factoring all skill sets, she is statistically closer to Gwenpool, Cable, and Nightcrawler than she is to the Mad Titan.



Now for the crazy but statistically significant section.  Howard the Duck (I’m hoping he gets a show on Disney+) and Old Man Logan are a propensity score match.

An example like this where many begin to argue in data science, when does subject material expertise come into play?  We can argue significance forever, on any topic, but we can agree on all Marvel Champions have a value if played correctly.







Regression Modeling

Recipe: 002 Marvel Cinematic Universe Regression Model


There’s is no argument against the Marvel Cinematic Universe being a financial success.  I’ll try to identify variables which can equate to box office success. The goal is to fit a regression model to Box Office USD for Marvel Cinematic Movie releases.
*At the time of cooking Ant-man and the Wasp did not have finalized Box Office USD data (This movie was excluded.) – TF







Thanks for stopping and chowing down on this Recipe (click the link for a reader’s friendly pdf version of this recipe)

Now try this delicious pancake recipe (with the Ironman Gold and Red finish) courtesy of Crème De La Crumb (Link Below):