Classification Tree, Game of Thrones, Tree Based Models

Recipe: 009 Game of Thrones Survival of the Fittest

FerraraTom


“When you play the game of thrones, you win or you die.” — Cersei

Let’s bring this quote to life in what I like to call a survival tree of the fittest.  This week’s analysis will focus on the character survival in Game of thrones.  Chow down and enjoy!


001


002


003

Winter is coming and you’d like to know your chances of survival in the Game of Thrones universe.

Let’s learn from those who have survived to this point and those who have met their unkindly fate.

To do this I’ll build a classification tree with my event being set to is the character alive (1 for yes, 2 for zero).  Classification trees in general test the null hypothesis, when we reach my tree visualization I’ll assign the color red to instances of were it’s highly probable of a character death.  Green leaves will indicate it’s highly probable a character survives… as long as all this criteria is met.

Think of this tree as a really morbid family tree, but since the data is Game of thrones it fits right into place.

The variables have readily available to me (hopefully they have importance) are as follows:

  • House Affiliation
  • Member of nobility
  • Marital Status
  • Gender
  • Family history of deaths
  • Popularity

004


005

From the initial read I see knowing if a character is popular among fans and if they are male hold the highest importance in determining survival.

Also the variables I have available account for 75% of the variability (a 25% miss-classification rate).

Let’s say you moved to Westeros, out of the gate you have a 25.4% chance of meeting your end.  At those odds I’m taking my chances but I should stay under the radar as much as I can, because the data warrants it.

If you become a popular character or are an integral part of the story, your death becomes more meaningful and your probability of survival is worse than a coin flip.

So let’s say you’re a like-able character (you can’t help it), not all is loss, as long as you’re a female.  The highest survival rate is the popular female character group.  This is a classic tale of high risk high reward.

006


007

A classification tree is a great way to visual your data and now I’ll walk us through this Game of Thrones survival tree.

Let’s start at the very top, the tree assumes everyone has a 75% of survival.  Now as the tree splits this Is where the interesting part begins, and our data story begins to unfold.

If you are a popular character you flow to the left side of the tree, your survival rate of 75% now drops to 48%.

Staying to the left side of the tree there is another important split, are you a male or female?  Female characters have a higher probability of surviving (87% if you’re popular and 76% if you’re under the radar).

If you’re a male and you’re popular you have a 42% chance of survival (We’re looking at your Peter Dinklage).

Now here’s the largest caveat to take with this classification tree: I’m assuming it will no longer be relevant after the final season.  Winter is coming and most likely our characters will see their end by hands of White Walkers.


008

What have we learned from diving into the Game of Thrones Data?

Everyone has starts off at a 75% survival rate and as your popularity grows your survival rate lessons by 27%.  If you’re a male your survival drops again by 33%.  If you’re a popular female character you are 45% more likely to survive versus your male counterparts.

An interesting tidbit…If you become popular and you are a female (hopefully the mother of dragons) you boast the highest survival rate of anyone in this universe, 87%.

 

After you have consumed this meal, I hope you take these findings and enjoy your episode of Game of Thrones. J  Also as always enjoy the featured pancake recipe below!


006

https://gameofthrones.fandom.com/wiki/Game_of_Thrones_Wiki


005

009


003_008

disney, Mickey Mouse, Regression Modeling, Theme Parks

Recipe: 006 Walt Disney World Parks and Resorts Revenue Influencer

FerraraTom

It all started with a mouse.  This mouse is turning 90 this year and Mickey Mouse has made his impact on society.  To celebrate, what better meal to cook us this week than Walt Disney World Data?  I’ll be challenging myself to

identify influencers on the Parks and Resorts Division’s yearly revenue.


001


002


003

004

With Mickey Mouse turning 90 years old this year, what better meal to cook us this week than Walt Disney World Data?  I’ll be challenging myself to identify influencers on the Parks and Resorts Division’s yearly revenue.

My first approach was to identify what happens during the year the revenue occurs?

The number of Animated Movies released by Disney

The number of Animated Movies featuring Disney Princesses

The number of Attractions add at all four main theme parks and then parsing this information out by the individual park

The first run was not an effective model: most of the variability in the data was not accounted for, and there were no independent variables of significance.

So my next approach was how do I capture word of mouth on movies and attractions?  Secondly, how do I incorporate when Disney starts charging admission to children (currently 2 yrs and younger, enter the parks for free)?

To knock out two birds with one stone, I settled on let me test a rolling 3-year average of all behaviors.  The results were very favorable, 67% of the variability is explained and I have interesting independent variables of significance to make a telling data story


005

If you’re a subscriber to this blog and enjoy the Stacks of Stats, you’ll recognize my preference for Q graphs.

There’s some curls at the tails but most of the data fits well, so there won’t be a need to run a more complex model.

Let’s take a bite into the initial read before accessing the financial impact of all these fun Disney variables.

I’ll caveat this, significance is in the eye of the beholder, and is up to interpretation of the  storyteller and data scientist.  The first read shows the 3-year average of total park attractions having the highest relationship to revenue and inversely the amount of attractions opened at EPCOT has significance but a negative impact on yearly revenue.

I’ll dive more into the individual impacts later, but I want to utilize my upper and lower bounds.


006

The output of this model shows the impact in millions USD.  Analyzing the cone, this is where our fairy tale begins to take shape.

Potentially the average amount of attractions introduced at the all four major parks can drive in $1.6 million USD.

With the Magic Kingdom driving most of this impact:

New attractions added at the Magic Kingdom can drive in $4.5 million USD.

The average amount of the Disney Princess movies does have more of an impact than factoring Disney releasing an animated movie as the only criteria.  What’s intriguing is the variability of our upper and lower bounds, there is a possibility there could be a loss of $50.6M.

007

What could be driving the inverse affect?  Multiple reasons:

1.The quality of the movie releases

2.The presence or in this case non-presence of a meet and greet at the theme park

3.The global economic climate (Less international travel impacts this!)


008

What have learned from diving into the Walt Disney Data?

There’s a reason WDW is investing in new IP based rides at Epcot and Hollywood Studios: they’ve been launching the rides outdated with their audience and they drive the lowest impact currently on yearly revenue.  I anticipate Epcot to see a steady growth on impact when Guardians of the Galaxy and Ratatouille open and a few years have passed.

Finally a Princess Animated Movie drives in 1 million USD more than a regular animated move release.

009

What could be the reasoning?  I’d guesstimate rides introduced at the Magic Kingdom (drives in +4.5M USD) is having a downstream affect on the Princess impact.  Most Princess interactions take place at the Magic Kingdom.

After you have consumed this meal, I hope you take these findings and with Mickey Mouse a Happy 90th Birthday. J  Also as always enjoy the featured pancake recipe below!


005

010

006

https://disneyworld.disney.go.com/


003_008

E-Sports, Logistic Regression, Overwatch

Recipe: 005 Overwatch League Inaugural Season Logistic Regression

FerraraTom

I’m excited to tackle the Overwatch League and my first dig into E-sports in general.  I’ve attended several conventions, including gaming conventions, and I will get this out of the way now:

I thought I was decent at video games… these athletes have shown I’m a very causal player.  This is a good thing, it was a pleasure to witness their craft.

The focus of this week is the probability of an individual player making the playoffs.  Throw into this meal where statistics based around player preferences and game-play performance.  To determine the variables throw into the final mix I threw in some confounding factors and profiling stats before going very heavy on player performance.


001


002


003


004


005


006


006

https://overwatchleague.com/en-us/

https://playoverwatch.com/en-us/


005

 

007


003_008

Classification Tree, Harry Potter, Tree Based Models

Recipe: 003 Harry Potter: Did Voldemort Get-cha? Classification Tree

 

FerraraTom“It does not do well to dwell on dreams and forget to live.” – Albus Dumbledore – Harry Potter and the Sorcerer’s Stone

In this post we won’t dwell but we’ll analyze and learn.  I ask that you play along and imagine yourself receiving your acceptance letter to Hogwarts (well let’s be honest here we’ve all imagined this at one point or another).

So you’ve hopped off the Hogwarts’s Express, ready for your studies and the fight the dark arts. Oh wait… nobody told you about the dark arts and all the threats looming your way? Ever wonder was the budget only allowed for owls to deliver acceptance letters? This week we’ll dive into the greatest threat in the Harry Potter Universe, Lord Voldemort.


003_001


003_002


003_003


003_004


003_005


003_006


003_007


003_008

Regression Modeling

Recipe: 002 Marvel Cinematic Universe Regression Model

 


FerraraTomThere’s is no argument against the Marvel Cinematic Universe being a financial success.  I’ll try to identify variables which can equate to box office success. The goal is to fit a regression model to Box Office USD for Marvel Cinematic Movie releases.
*At the time of cooking Ant-man and the Wasp did not have finalized Box Office USD data (This movie was excluded.) – TF


002001


002002


002003


002004


002005


002006


Thanks for stopping and chowing down on this Recipe (click the link for a reader’s friendly pdf version of this recipe)

Now try this delicious pancake recipe (with the Ironman Gold and Red finish) courtesy of Crème De La Crumb (Link Below):

002007

 

K-Means Clustering, Pokemon Go

Recipe: 001 Pokémon Go K-Means Clustering Segmentation

 


FerraraTomHere’s a treat for all the Pokémon Go players out there.  You’ll find below a recipe for a cluster analysis intended to guide you in building the most cost effective team of Pokémon. The goal of this recipe is to segment Kanto (GEN 1) Pokémon which can be found in the wild, with an emphasis on return on investment (ROI), or in this case Candy Cost investment and Gym Battle return.  Hope you enjoy! – Tom

001


002


003


004


005


006


007


008


Good Old Fashioned Pancake Recipe:

https://www.allrecipes.com/recipe/21014/good-old-fashioned-pancakes/

Cool Pokémon Pancake Art:

Support our theme for this week’s analytics:

https://www.pokemongo.com/en-us/

https://www.pokemon.com/us/

Follow this blog on Instagram: Pancake_analytics