Marvel Comics, Propensity Modeling, Regression Modeling

Recipe 013: Marvel Comics Propensity Score

FerraraTom

How crazy would it be if I told you Howard the Duck and Old Man Logan are closer to each other in skill sets than they are to any other Marvel characters?  Or how about Thor and Dr. Octopus are lookalikes as well?  Let’s answer these questions together by wrangling some readily available data.


 

008

 


 

001

If I’ve learned anything from my career in data science it’s this: 80% of the work is data gathering and etl work, and 20% is analysis.

Nothing holds truer to this statement than finding data of Marvel characters skills set, on a normalized scale.  In this data story I’ll be using data from Marvel Contests of Champions (power index levels, health and attack) and the Marvel Battle Royale (a twitter fan poll of greatest superheroes).

A few more variables I’ll need to calculate around the results of the Marvel Battle Royale Twitter Fan Poll:

Total votes per each round

Average Total votes

A flag for if they were higher than average total votes per marvel character

This flag I’ll use as my dependent variable and my independent variables will be the Marvel Contest of Champions statistics.

What will this do?  This will predict the likelihood a Marvel Character would receive higher than the average total votes in the Marvel Battle Royale.

Once this is calculated I’ll receive an output of coefficients which I can apply to the rest of the Marvel Characters whom weren’t in the Marvel Battle Royale to create a propensity score.


 

002

Now let’s back track a little bit and see why I’m going with a propensity model as opposed to a grouping by opinion.  I.e. Let’s put all the top attackers in the same category.

The top 3 characters based on Attack are Rocket Raccoon, Spider-man (Symbiote), and Blade.

In the above histogram, if you look all the way to the far right you’ll notice they are the data points on their own little island.


 

 

003

Well what if I just grouped everyone by Health?  This data visualization looks more promising but mostly likely there would overlap on the other attributes and you wouldn’t be able to implement this successfully.


 

004

The power index by definition could be suitable but from the top 3 selected on power index I can tell this rating wasn’t an index in the vein of what I would typically use an index for (time-series forecasting) and it looks to be similar to the Pokemon Go Combat Point System, the ability to use their full potential.


 

005

One use of a propensity score is to create similar groups, based on the likelihood of performing a behavior.

In this case Doctor Octopus and Thor (Ragnarok) statistically the same in the Marvel Contest of Champions skill set.  For those of you want to go down and interesting rabbit whole, you can find YouTube videos on why Doctor Octopus should be in a demi-god tier.

This propensity score approach literally put Doctor Octopus in the same tier as a demi-god!


 

006

Medusa by power index alone would be close to Thanos but factoring all skill sets, she is statistically closer to Gwenpool, Cable, and Nightcrawler than she is to the Mad Titan.


 

007

Now for the crazy but statistically significant section.  Howard the Duck (I’m hoping he gets a show on Disney+) and Old Man Logan are a propensity score match.

An example like this where many begin to argue in data science, when does subject material expertise come into play?  We can argue significance forever, on any topic, but we can agree on all Marvel Champions have a value if played correctly.


006

009


 

005

010


003_008

nintendo, Regression Modeling, Super Mario

Recipe 010: Mario Kart Game-play Improvement Controller Trials

FerraraTom

Before I dive into this week’s data story, let me state why I love the Nintendo Switch.  I personally feel there’s a need for video games to be a social event, and couch co-op is a must have feature.  The Nintendo Switch offers several games which meet this need.

My family loves playing video games and most of all we love playing video games together.

Most of the Nintendo games I’ve grown up on and have played over the years, Mario Kart by far is one of my favorites.  I’ll admit my wife shows me how it’s done.


001


002


003

What I do find interesting about the Nintendo Switch is the joy con controllers, there’s a learning curve (but a huge improvement on the Wii-mote) and most veteran gamers prefer an alternative.

One alternative is the wireless controller, very similar to the X-box controller format.  I did pick up the Yoshi version for my wife and she loves it and personally feels it improves her game-play.

I’d thought it was time to put this notion to the test, what impact if any does a wireless controller has on game-play performance versus using a joy con.

Mario Kart seemed like the logical choice for this is experiment, it’s a multiplayer game, you can standardize your users (via ride type and modifications), and performance is measured in a continuous variable of points.

004


005

A total of 8 trails were ran under these conditions:

-Standard Kart

-Standard Wheels

-Standard Flyer

-Mushroom Cup

-50 cc length race

-2 gamers

Half through the trial one gamer switched to the wired controller (Test group) while the other gamer stayed on the single joy con (Control group).

Results were documented, and the etl. process began, points scored each race would be used as the key performance indicator.

I next ran a linear regression (great for evaluating an A/B test), with my dependent variable being the points scored after the event (introducing the wired controller) the two independent variables: Treatment and Pre Points Scored.

006


007.png

In this model I wasn’t concerned with the r-squared value or the significance level of each variable.  The sample data was not large enough, this was closed circuit small market test.

The model itself did show to be significant, which is a good indicator I can continue with the results.  Evaluating my Q’s graph, I see the model fits well, the trend goes through all the data points.

In my summary fit I notice there is a positive relationship between treatment (group) and post points scores.  At first glance this says you improve your Mario Kart game-play performance if you play with a wireless controller.

To complete this story I want to know my upper confidence level to be able to know by how many points and is this enough to move me up the rankings.

Using a wired controller has the potential to increase a gamers point performance by over six points each race.

The average points differential between race placement is 1.2 points.  This 6-point increase is enough to move you roughly 4 places, depending on your historic placement.

008.png


009

What have we learned from diving into the Mario Kart Data?

The controller you play with matters, switching to a traditional wired controller can potentially improve your point score by 6.5 points,

which depending on your average race placement can move you up 4 places in the final standings.

Observing the CPU controlled racers, Shy Guy performed the best with an average final placement of 2.8.  The heavy class overall was the weakest group but without Bowser, it could have been worse.  Bowser’s average final placement was 4th.

 

After you have consumed this meal, I hope you take these findings and enjoy your next Mario Kart Grand Prix.  Also as always enjoy the featured pancake recipe below!


006

010


005

011


003_008

Board Games, Logistic Regression, Regression Modeling

Recipe: 008 Likelihood a Board Game Is Universally Loved

FerraraTom

For this week’s analysis I’m taking a different approach to the introduction.  I reached out to @missionboardgame to write the forward.  They are a couple from Turkey who tries their best to inspire people to join board game community.  With out further ado here is there overview of the modern board gaming climate:

We think a successful modern board game should include the following features:

✔️Your decisions should have an impact on the game progress.
✔️Minimal randomness.
✔️No player elimination as possible as there can be throughout the game.

In addition to those, theme, artwork and mechanics are also significant for our decisions while purchasing board games. Therefore, our favorite game is Robinson Crusoe: Adventures on the Cursed Island. It is a cooperative survival game where you are trapped on a deserted island. Each decision you have made previously has an outcome afterwards. The harmony between the theme and the rules is perfectly arranged so that you feel very integrated to the game. By this way, every action you take seems meaningful and logical. Also we love feeling the cooperation among us since we are usually 2 players. – Mission Board Game

36607436_1332042043597675_3622509673829105664_n


001


002


003

Countless nights I’ve played board games among friends and family.  Every new year’s eve my family and I play Monopoly.  A few reasons: the game-play length, the amount of players, and the simplified game-play.  I have 5 siblings, so saying it’s difficult to find a game for all of us to play is an understatement.

The reasons why we enjoy board games is an interesting topic.  Is it the theme of the game?  Is it the amount of players required?  Has the game received universal praise from critics alike? Is it a common game most households own, and we grew up playing?

All the above-mentioned variables I’ll throw into a logistic regression model and use the Bayes theory of probabilities, to determine the probability of a board game player will rank a game higher than the average score.

During the first read I see the model is statistically significant based on a z score of less than .05.   A few things stand out to me immediately:

1.) Not all variables have a positive relationship to a highly scored board game

2.) There are some strong social elements going on here (i.e. the longer the play the higher the impact may imply games which encourage discussion are rated higher)

3.) Fantasy themed board games are not ranked high (I have a D&D and video games impact theory)

004


005

Before jumping into the positive relationships, I’d like to touch briefly on the negative relationship independent variables.

1.) Fantasy Theme: I included this variable in the model expecting to see a very high positive correlation, but I was very wrong.  To quote Rick and Morty : “Sometimes science is more art than science.”  In the spirit of the quote, I’ll assume there are threats to the fantasy themed board games genre, in the form of Role-Playing Video Games.  The storytelling in this medium has progressed some much in last decade it out paces the anything a board game could offer.

In other words the target audience is leaving.

2.) Major of voters:  This variable is all about the amount of users who share their ranking.  A rule of thumb for rankings, reviews and ratings is those who go through the effort of expressing their opinion either love or hate the product.  The upper and lower confidence levels mirror themselves, because of this skew-ness.

006


007

Next, I’ll discuss the positive relationship independent variables (focusing on those with the highest impact):

1.) Board games with an average game-play of at least two hours or more has the highest positive impact on a user rating a board game score above average.  What makes a game have a long game-play?

Multiple reasons: more players involved, more game-play mechanics, and mostly importantly more discussion.  The soul of any good board game is bringing people together.

2.) The second highest impact comes from the average score displayed from Game Board Geek.  The reason behind this is users see this rating first before submitting their rating.  Think of it like the Rotten Tomatoes effect, people want to feel like they have universally accepted opinions.  Take the beginning of this data story for example, I mentioned Monopoly is a family tradition of mine, this potentially could have swayed your opinion on this board game.  Possibility you could rate this higher than a game, say is fantasy themed, based on this model output.

For your own reference, this model has an accuracy rate of above 70%

008.emf009


010

What have we learned from diving into the Board Game Data? 

Board games are most successful when they encourage the spirit and soul of “game night”, a gathering of friends and family discussing and enjoying each other’s time.  Adventure and exploration themes are the majority of the top ten highly successful board game genres.  The longer the game-play does not mean the game is like pulling teeth or the pace is slow.

It is more of an indicator of the amount of players required and the story telling the game has in driving a great game night experience.

 

After you have consumed this meal, I hope you take these findings and enjoy your next game night.  Also as always enjoy the featured pancake recipe below!


006

https://boardgamegeek.com/


005

011


003_008

disney, Mickey Mouse, Regression Modeling, Theme Parks

Recipe: 006 Walt Disney World Parks and Resorts Revenue Influencer

FerraraTom

It all started with a mouse.  This mouse is turning 90 this year and Mickey Mouse has made his impact on society.  To celebrate, what better meal to cook us this week than Walt Disney World Data?  I’ll be challenging myself to

identify influencers on the Parks and Resorts Division’s yearly revenue.


001


002


003

004

With Mickey Mouse turning 90 years old this year, what better meal to cook us this week than Walt Disney World Data?  I’ll be challenging myself to identify influencers on the Parks and Resorts Division’s yearly revenue.

My first approach was to identify what happens during the year the revenue occurs?

The number of Animated Movies released by Disney

The number of Animated Movies featuring Disney Princesses

The number of Attractions add at all four main theme parks and then parsing this information out by the individual park

The first run was not an effective model: most of the variability in the data was not accounted for, and there were no independent variables of significance.

So my next approach was how do I capture word of mouth on movies and attractions?  Secondly, how do I incorporate when Disney starts charging admission to children (currently 2 yrs and younger, enter the parks for free)?

To knock out two birds with one stone, I settled on let me test a rolling 3-year average of all behaviors.  The results were very favorable, 67% of the variability is explained and I have interesting independent variables of significance to make a telling data story


005

If you’re a subscriber to this blog and enjoy the Stacks of Stats, you’ll recognize my preference for Q graphs.

There’s some curls at the tails but most of the data fits well, so there won’t be a need to run a more complex model.

Let’s take a bite into the initial read before accessing the financial impact of all these fun Disney variables.

I’ll caveat this, significance is in the eye of the beholder, and is up to interpretation of the  storyteller and data scientist.  The first read shows the 3-year average of total park attractions having the highest relationship to revenue and inversely the amount of attractions opened at EPCOT has significance but a negative impact on yearly revenue.

I’ll dive more into the individual impacts later, but I want to utilize my upper and lower bounds.


006

The output of this model shows the impact in millions USD.  Analyzing the cone, this is where our fairy tale begins to take shape.

Potentially the average amount of attractions introduced at the all four major parks can drive in $1.6 million USD.

With the Magic Kingdom driving most of this impact:

New attractions added at the Magic Kingdom can drive in $4.5 million USD.

The average amount of the Disney Princess movies does have more of an impact than factoring Disney releasing an animated movie as the only criteria.  What’s intriguing is the variability of our upper and lower bounds, there is a possibility there could be a loss of $50.6M.

007

What could be driving the inverse affect?  Multiple reasons:

1.The quality of the movie releases

2.The presence or in this case non-presence of a meet and greet at the theme park

3.The global economic climate (Less international travel impacts this!)


008

What have learned from diving into the Walt Disney Data?

There’s a reason WDW is investing in new IP based rides at Epcot and Hollywood Studios: they’ve been launching the rides outdated with their audience and they drive the lowest impact currently on yearly revenue.  I anticipate Epcot to see a steady growth on impact when Guardians of the Galaxy and Ratatouille open and a few years have passed.

Finally a Princess Animated Movie drives in 1 million USD more than a regular animated move release.

009

What could be the reasoning?  I’d guesstimate rides introduced at the Magic Kingdom (drives in +4.5M USD) is having a downstream affect on the Princess impact.  Most Princess interactions take place at the Magic Kingdom.

After you have consumed this meal, I hope you take these findings and with Mickey Mouse a Happy 90th Birthday. J  Also as always enjoy the featured pancake recipe below!


005

010

006

https://disneyworld.disney.go.com/


003_008

Regression Modeling

Recipe: 002 Marvel Cinematic Universe Regression Model

 


FerraraTomThere’s is no argument against the Marvel Cinematic Universe being a financial success.  I’ll try to identify variables which can equate to box office success. The goal is to fit a regression model to Box Office USD for Marvel Cinematic Movie releases.
*At the time of cooking Ant-man and the Wasp did not have finalized Box Office USD data (This movie was excluded.) – TF


002001


002002


002003


002004


002005


002006


Thanks for stopping and chowing down on this Recipe (click the link for a reader’s friendly pdf version of this recipe)

Now try this delicious pancake recipe (with the Ironman Gold and Red finish) courtesy of Crème De La Crumb (Link Below):

002007