Recipes

Uncategorized

Pancake Analytics Galar Guide

001.png


Welcome and whether you’re a long time follower or new to the page, I’m glad to have you here.  First off let me get some updates out of the way and a little bit of refresher of what this site is all about.

My Pokemon data science YouTube Channel is progressing along nicely, the intro almost complete and the format for each video is almost nailed down.

A little background on me:

I’m a manger of decision sciences department and I want to give back to the comic con community.  I’ve been attending comic cons since high school (I won’t carbon date myself here but let’s just say my favorite video game is Civ II).

I want to teach, share, engage and learn from the comic con family, as well as share my years of analytics experience with aspiring analysts or those just looking to improve and understand the data behind their fandom.

This post is a new flow and stray from what I typically do, the idea here is to end with a Pokemon team recommendation for those playing the shield version and those playing sword (personally I picked up sword, love it).


002

I’ll walk you through a machine learning approach to building an optimized Pokemon team, the focus here is getting through the main story (in a different post I’ll give recommendations for a competitive team).

Excluded from our selection will be legendary and mythical Pokemon, I exclude these because they are in a league of their own and it wouldn’t be informative if I told you to use those as a top tier.

I have the underlying data for the Pokemon in the Galar region:

Base attack, base defense, base HP, base special attack, base special defense, base speed, version exclusives, and location in the Galar region.

In this approach I will avoid recommending Pokemon which must be caught in the “wild” area, but the the “wild” area is utilized to level up your team.  I highly recommend you use the raiding system in the wild to earn candy rewards and level up your Pokemon team quickly.

The final team recommendations will take inconsideration the moves they can learn, the gym leaders you face, as well as the champion at the end of the main story.  I’ll exclude the post game (hilariously ridiculous post game by the way, as most of the game has this quirky charm to it, I love it).

If you’re curious and want to do your own research in the image above I’ll be performing a K-means clustering which falls under unsupervised machine learning.


003

A K-means clustering algorithm will group our Pokemon into clear tiers.  Without running this algorithm we’re in the top left corner, and we can barely see our Yamper.

He’s a good boy, so we want to see him clearly and the more groups we choose the clearer we can see where Yamper ranks among all the Galar region Pokemon.

004

The above elbow curve helps in deciding how many tiers we should select.  When choosing the amount of groups or clusters, you need take a few thing into consideration:

How many clusters can I reasonably speak to?

What will give me a clear distinction between my Pokemon?

How quickly can I validate the clusters? (Selecting 12 clusters, it would take a while for it pass the sniff test.)


005

This bubble chart is a good visual representation of our clusters and the Pokemon of the Galar region.  A K-means cluster puts the focus on trends in the data, so all of those base stats I mentioned before our standardized.

Standardizing our variables means the mean of each variable is set to zero.  This is helpful if ranges or your underlying data is very different.  As it pertains to Pokemon this makes sure I have a good mix and I’m not only bucketing based on attackers and HP.

The above bubble chart has a lot of overlap in our clusters and how reasonable is it to bring a stage 1 Pokemon in your final team.

So I’ve got some data clean up to do, I’m going to limit my data to final stage Pokemon and I reduce the number of clusters.  I like to keep it at 3, so I can talk about high, medium and low value.

006

Now this is a much easier to read bubble chart.  Let me give a quick reference of what this chart tells us.

Cinderace is statistically more like Dragapult then it is like Coalossal, although they’re both fire types.  I want to move you away from the typing discussion for now.


007

Here’s a high level view of our clusters and to keep on them instead of high, medium, and low I’ve given the cluster poke-ball teamed names.

Now for each base stat there is high/medium/low but this in comparison to the other clusters.


008

The ultra cluster at a glance:

61% of the Pokemon in the Galar region, they excel in HP and attack.  They are not a handicap in any other categories, these should be the bulk of your team and are good to compliment if you are dipping into the other clusters.


009

The quick cluster at a glance:

11% of the Pokemon in the Galar region, they excel in defense and special defense.  They are the weakest cluster in all other categories.  I assigned them a quick ball because most likely you are catching these Pokemon to fill your pokedex in the post game.

A quick reminder about these clusters, is the focus here is the main story not competitive scene, sorry trick room fans.


010

The premier cluster at a glance:

28% of the Pokemon in the Galar region, they excel in Special attack and Speed.  They struggle in defense and special defense.  There’s a lot of gems in this cluster, but you’ll need to level them up, have a proper move-set and do the damage quickly.  Don’t drag out the battle with one these Pokemon.


Now what you’re really here for, the team recommendation.

The outcome of this is to use the findings from the K-means cluster and recommend a team based on the game version you are playing (either sword or shield).  All Pokemon must be catch-able beyond the “wild”.  Finally a starter must be selected.

You’ll see two charts below: a sword version and shield version.  Both will have you rounding out your final team before the fifth gym.


011.png


012


Thank you for reading through and feel free to reach out to me on instagram (@pancake_analytics) if you’d like the code that produces this, the data set, interested in an upcoming comic con panel, or just want to learn more about your fandom through data science.


028

Uncategorized

AFO 2019 Player One, Power Ups, & Probabilities: Panel Recap

012


001


logo


Before I share the entire Anime Festival Orlando (AFO) 2019 Panel I’d like to give some insight on the nerves I had going into this panel and how the audience helped me get into the groove.

This panel I opted to go solo on, normally I have guest panelists join me, so the nerves where at all time high.

Could I keep the entire room engaged for a data science panel?  Would the flow drastically change?

I was set up ready to go early, and had great discussions with those who sat in early, we discussed whether or not to get pick-up Let’s GO Pikachu/Eevee.  Even one of the attendees were referred to attend panel from their friends who attended my Tampa Comic Convention Panels!

This was a first and good gut check for me, that what I’m trying to accomplish with Pancake Analytics is a good thing and is going over well.

I can’t thank the community we’re building here together enough!


028


This panel was held on: Saturday, August 10, 2019 at 8:30 PM – 9:30 PM

In Orlando, Fl during AFO 2019.


Our journey begins…

002

The steps on our Pokémon Journey:

  • New Point Of View on Pokémon
  • Field Researchers & Learning from them
  • Pokémon Team Recommendations

A New Point of View on Pokémon : Overview

003

A k-means cluster uncovers trends within our Pokémon data to understand the relational similarities and differences on key in game attributes.

The more clusters the clearer our picture becomes and the deeper we can understand the Pokémon throughout our journey.


A New Point of View on Pokémon : The Results

004

A Brief overview of the approach:

Standardize your variables (Set each variable to mean of zero)

Analyze your elbow curve (Look for when the line plot elbows)

Validate your clusters (Perform a uni-variate analysis on core kpis for each cluster)

3 Distinct Groups:

High – Highest in all categories except for base defense and hp

Medium – Highest on defense, middle ground in everything else

Low – Only high on hp


What does this tell us about the starters?

005

The output of the k-means clusters can be used in to help determine your approach from the very beginning.

Reading the pyramid:

Easy path: (Build you team around this Pokemon & steamroll grind the competition)

Greninja, Swampert, & Sceptile

Hard path: (Need to acquire complimentary Pokemon, you learn more about Pokemon this way)

Serperior, Meganium, Torterra, & Chesnaught


How do we implement this scoring?

006

I needed more data to implement this approach.

I reached out to my instagram followers with a survey, and volunteers we’re given:

5 Questions:

What’s your ideal team of 6 Pokémon?

What year did you start playing Pokémon?

Do you play Pokémon GO?

How many Pokémon games have you played?

Do you play the Pokémon TCG?


Implementing the scoring: Trust The Process

007

propensity model is a statistical scorecard that is used to predict the behavior of your customer or prospect base. Propensity models are often used to identify those most likely to respond to an offer, or to focus retention activity on those most likely to churn.

I used this model to predict if a Pokemon would be selected in the survey and used these results to recommend Pokemon a survey participants didn’t select but would give them statistically the same results of playing.

This is the whole Pokémon journey coming to a full circle.

The Pokémon Professor has done their own research and builds a model.

The field research team assists the Pokémon Professor with gathering new data.

The Pokémon Professor uses the model to assist the field research team.


Here are results of my recommendation model:

008.png

009

010


029

Is Ash getting better with each season?

I’ve analyzed all of Ash’s teams throughout the anime (from Kanto through XYZ).  I want to answer the question… Is Ash getting better with each season?

First challenge was how do we define success and what data science methodology do we use?

One area I feel gets over looked in data science is the performance analytics realm, using univariate and multivariate statistical analysis.

Univariate and multivariate represent two approaches to statistical analysis. Univariate involves the analysis of a single variable while multivariate analysis examines two or more variables. Most multivariate analysis involves a dependent variable and multiple independent variables

011.png

How do we determine success?

Base stats seem like a good starting point.

But as you can see one Pokémon can throw off our data… cough…  cough … Greninja cough … cough


Here’s how we do it, use the Pokémon GO Approach

012

As much as I feel Pokémon GO has flaws which shouldn’t get a pass, their CP attribute holds the answer to standardizing and scaling Ash’s teams.

What is CP in Pokémon Go?

CP (combat power) is not related to how much damage a Pokémon deals when attacking gyms, but is a combination of attack, defense and stamina (HP)

013

014

Using this approach helps level the field for those teams where Ash was heavy in one attribute, or when he only had one strong Pokemon.


From beginning to end Ash increased his CP by 8%

ash-hat-pikachu-169

015

His best rotation was in Sinnoh

  • He evolved the most Pokémon compared to his other teams.
  • He evolved 3 Pokémon all the way to their final evolution.
  • 3 of his Pokémon fall into our High cluster.

016

His worst rotation was in Johto

  • He evolved only one Pokémon (Notctowl he found).
  • He attempted to build a similar team he had in Kanto.
  • Only 1 of his Pokémon fall into our High cluster.

Game Time: Let’s GO! Wonder Trade: Overview

I personally feel one of the best ways to reinforce learning is through a game.  During all of my panels I like to play a game that reinforces a machine learning technique, in this case the propensity model.

Those who participated received a rare Pokemon TCG EX/GX individual card, a unified minds unopened TCG booster pack, and a gift certificate to Burger King ( a meal on me ).

Food is usually hard to come by at a convention, so I went back to my younger roots, and thought well I would have loved to get a free meal at a convention.

017.png

5 Volunteers

On the screen will be 3 of Ash’s Pokémon

2 Pokémon are look-a-likes (statistically speaking)

Volunteers will do their best to convince the me of which two Pokémon are look-a-likes and who should be wonder traded

For participating volunteers receive a fabulous prize



028

Classification Tree, E-Sports, K-Means Clustering, Logistic Regression, NBA2k, nintendo, Overwatch, Propensity Modeling, Regression Modeling, Super Mario, Tree Based Models

TBCC 2019 Player One, Power Ups, & Probabilities: Panel Recap

012


001.png


This panel was held on: Saturday, August 3, 2019 at 3 PM – 4 PM
And here was the pitch:
Join the data science debate of the highest critically acclaimed video games vs the nostalgia of games we grew up. The data science team at Pancake Breakfast: A Stack Of Stats will be serving up supporting data and driving the discussion for both sides of the debate. Panelists will debate greatest video game of all time or overrated!
The Panelist were myself and Stephen (an indie game developer).  Obviously Steve had the advantage going into this debate but it was really fun and the audience was very engaged, probably one of our best Q&A sessions of all time.
015

Video Game Recommendation Engine – This is how we do it

These are data science panels and we started off this panel with a video game recommendation engine.  I had Stephen fill out a survey prior to the panel and from his results I built a recommendation model, with the goal of selecting games he has not played (he’s played a lot of games, so not an easy task) and would rate above average.

002

How are we going to build this recommendation?  Through Propensity scoring!

A propensity score is an estimated probability that a data point might have the predicted outcome.

  • One of our panelists completed a survey and had to rank video games they have played
  • Their responses were linked to our ancillary data (critics score, user score, and genres)
  • Our model shot out a score between 0 and 1. The closer to 1 the more likely this game would be enjoyed by the panelist.

003


 

Video Game Recommendation Engine – The Output

004.png

For this panelist, the survey told us this about their gaming preferences:

The value User Score more than the Critics Score.

Their preferred genre is Action Adventure.

Their preferred platform is the PS2.

005


Video Game Debate: Overview

006

On the screen will be a video game, with some profiling data.

Panelist will debate the impact, perceived and replay value of the featured game.

Crowd will decide who made the better argument.

This is the meat of the panel., on the screen is also the IGN review headline and rating, Stephen and myself would take turns and argue if it deserved it’s ranking.


Goldeneye 007

007

Stephen went first and argued that Goldeneye does not deserve this high of rating and his key point was on the replay value.  I attempted to argue on to value it at time of release.  The crowd sided with Stephen.


Pokémon Gold & Silver

008

I went first this round and argued for the rating, this was a very pro Pokémon crowd.  Stephen brought up good points on where he thinks the series should go and adding another region is not the answer.  The crowd sided with Me.


Ultimate Marvel vs. Capcom 3

009

Stephen chose to argue for this game, I wanted to throw a curve-ball in this debate.  It would have been very obvious if we chose Marvel vs Capcom 2, too easy.  I argued that it wasn’t even the best in the series, and the best in the series is actually X-men vs Street fighter.


Halo Combat Evolved

010

Stephen was on team Halo for this one, I love Halo as well, but the crowd did not.  That was a shock to us but maybe Halo doesn’t have replay value?  Or everyone is getting tired with the series.


Battle Dome: Overview

011

Two games go in… only one comes out

Panelists will argue for a game, they cannot both argue for the same game

The crowd decides who had the best argument

This was fun and challenging section of our panel.  I won’t go into details on this section but I do want to try something out.  As test to see who is interacting with my page by reading the data stories, I have a special giveaway.

Here are the rules, you must have an Instagram account. You must be following my Instagram account: @pancake_analytics.

To enter you need read through the battle dome section, screen shot your favorite match-up and post it to instagram.

In this post I want you tag @pancake_analytics and caption the post with “Who do you have in this Battle Dome match-up?”.

This giveaway will end on December 31st, 2019 and the winner will receive a Game-stop Gift card from me.  For to use on your next video game purchase in the new year!

Here’s the disclaimer I have to post:

Per Instagram rules, we must mention this is in no way sponsored, administered, or associated with Instagram, Inc. By entering, entrants confirm they are 13+ years of age, release Instagram of responsibility, and agree to Instagram’s term of use. Good luck!!!!!

Here’s the battle dome match-ups:


012


013


014


I want to personally thank everyone who attended the panel in Tampa, at the Tampa Comic Convention.  I look forward to meeting again in 2020.


003_008

K-Means Clustering, Logistic Regression, nintendo, Propensity Modeling, Regression Modeling, Super Mario

TBCC 2019 Smash Brothers, Segmentation & Strategy: Panel Recap

012


003


This Panel was held on:

Friday, August 2, 2019 at 7:30 PM – 8:30 PM

During the Tampa Bay Comic Convention 2019, held at the Tampa Convention Center.

The Panelists were:

Tom Ferrara (@pancake_analytics) , Kalyn Hundley (@kehundley08), Andy Polak (@polak_andy)

001

I want to take a quick moment to discuss the panelists.  I love giving as many different point of views as possible to these data science panels.  Without this variety of point of views it’s more of a lecture and less of a discussion.  This mix of panelists gave the audience the data science view, the tech industry view and the biological sciences view.  Best part about this is Smash Brother brought us all together.


Changing the Tier Conversation

004.png

One of the main objectives of this panel was getting a discussion going on tier selection in Smash and how do we base tier selection in data science, and how do we validate our findings through one of the best players in the game.

A k-means cluster uncovers trends within our Smash Brothers data to understand the relational similarities and differences on key in game attributes.

The more clusters the clearer our picture becomes and the deeper we can understand the pros and cons of each main selection.


005.png

A brief overview of a k-means cluster:

  • Standardize your variables
  • Analyze your elbow curve
  • Validate your clusters

Treat each game release as new product launch or a change in the market.

You would re-score your data, to understand the current market and you’re able to migrate and understand how the meta-game has changed.


006

We end up with five unique clusters:

Floaters:

This group is the slowest by run speed and lightest by weight.

Jack Of All Trades:

They are middle group on everything, there is no distinct trend.

Dashers:

Like the Jack of All Trades group but faster.

Air Tanks:

Fast in aerial attacks and the heaviest of the characters.

Speedsters:

This group is the fastest and the lightest.


007

propensity model is a statistical scorecard that is used to predict the behavior of your customer or prospect base. Propensity models are often used to identify those most likely to respond to an offer, or to focus retention activity on those most likely to churn.

So who should be your main?  In this segment I rely on industry knowledge as well (ZeRo’s tiers as dependent variable).   I’ll build propensity score with the following independent variables:

  • Change in air acceleration
  • Base air acceleration
  • Base speed in the air
  • Base Run Speed
  • Character Weight
  • Ultimate Smash Bros. Cluster
  • Wii-U Smash Bros. Cluster

008


What makes these three stand above the crowd?

The are middle ground on weight, fast air accelerators.

What are the differences between the three?

Wario has a slow run speed.

Palutena is the lightest.

Yoshi is the middle ground of this group.


The Curious Case of Ganondorf

009

Ganondorf has more in-common with Jiggly Puff than he does Bowser.

The reason being is he’s quicker and can adapt well in aerial attacks and in falling than Bowser can.

On the flip-side of this I can also say Bowser more accurately represents how he’s viewed from the super Mario franchise, in Super Smash Bros. Ultimate.


Game Time: Name that segment: Overview

010

I personally feel one of the best ways to reinforce learning is through a game.  For this panel I decided to reinforce the k-means segmentation and wanted volunteers to guess the segment 3 characters on the screen fall into.

Here was the overview:

5 Volunteers

On the screen will be 3 characters

All 3 characters belong to the same segment

Volunteers will do their best to convince the panel of which segment the characters fall into:

  • Floaters
  • Jack of All Trades
  • Dashers
  • Air Tanks
  • Speedsters

For participating volunteers receive a fabulous prize.

For this particular game the prize was an amiibo of their choice that works with Smash Ultimate for the Nintendo Switch.


I want to personally thank everyone who attended the panel in Tampa, at the Tampa Comic Convention.  I look forward to meeting again in 2020.


003_008

Marvel Comics, Propensity Modeling, Regression Modeling

TBCC 2019 Avengers, Algorithms, and Analytics: Panel Recap

012


002


This Panel was held on:

Friday, August 2, 2019 at 9 PM – 10 PM

During the Tampa Bay Comic Convention 2019, held at the Tampa Convention Center.

The Panelists were:

Tom Ferrara (@pancake_analytics) , Kalyn Hundley (@kehundley08), Andy Polak (@polak_andy)

013

 

I want to take a quick moment to discuss the panelists.  I love giving as many different point of views as possible to these data science panels.  Without this variety of point of views it’s more of a lecture and less of a discussion.  This mix of panelists gave the audience the data science view, the tech industry view and the biological sciences view.  Best part about this is the avengers brought us all together.


003

When I pitched this panel the idea was what happens when a data scientist gets hold of the infinity gauntlet?  Pictured above is a visual representation of how I’m going to use each stone.

Use the Time Stone to predict the box office sales for the MCU and determine the top influencers for success.

Use the Power Stone to eliminate low hanging fruit.

Use the Soul Stone to uncover the underlying attributes of the marvel universe.

Use the Space Stone to transport the marvel universe to their closest match.

Use the Reality Stone to show you the marvel universe in a new light, perfectly balanced.

Use the Mind Stone to convince you this matching worked.


Time and Power Stones: What is influencing the MCU box office success?

004.png

I waked through those in attendance the output of regression model I built to unlock the the key influences of the Marvel Cinematic Universe and their relation to box office sales.

Considered influencers:

  • Rotten Tomatoes Scores (Critic and Audience)
  • Movie Release
  • Time since last MCU release
  • Solo Movie Releases
  • Was Iron Man in the movie?

Two Key Influencers stand out:

Having Iron Man in an MCU Movie drives in $100.5MM

The further along in the series drives in at least $216.8MM.  Story Development matters here’s the statistical proof!


Soul and Space Stones: Refitting the Marvel Power Scale

005

During this panel I walked the crowd through the output of a second machine learning algorithm, a propensity score.

Ingredients in the batter:

  • Marvel Contests of Champions (MCC) Power Index Levels
  • MCC Health
  • MCC Attack
  • Marvel Battle Royale (MBR) Twitter Poll:
  • TTL Votes per round, Avg TTL Votes

Flipping the pancakes:

Predict the likelihood twitter would vote for a character

Re-purposing this score to apply it to characters not in the MBR Twitter Poll


Reality and Mind Stones: Perfectly Balancing the Marvel Universe

006

This approach goes beyond ranking by attack, or defense.  This approach takes all those attributes together as well as the fan opinion.

If you only look at attack… you get skewed results

If you only look at defense… you get skewed results

A little bit of good… a little bit of crazy…

Old Man Howard the Duck?

Doctor Octopus the Demi-God?


Marvel Rapid Fire: Marvel Analytics Comparisons

007.png

This was one of my all time favorite segments out of all the comic cons I’ve had the pleasure of paneling at.  Quickly I would show the audience an analytics technique and show them the Marvel equivalent.  I think this technique is very effective in reinforcing our learning and opening up data science to a new audience.

Everything we just went through were machine learning techniques

Machine Learning is the Taskmaster of Data Science

Learns from past data, trains, and attempts to apply this training to new data

When something new is introduced it takes time to catch up


A/B Testing and Incremental ROI is the plot of Civil War

008


A neural network is Ultron… learns from observational data & figures its own solution

009


Dr. Strange ran a logistic regression to find out the odds-on Titan

010


Into the Spider verse was the perfect implementation of a random forest

011


Game Time: Marvel Team-Up: Overview

012


One of the best ways to reinforce learning is through a game.  During this panel I wanted to reinforce the learning from the propensity score.

I asked for 5 volunteers.  On the screen were 3 marvel characters.  2 characters on screen were look-a-likes (statistically speaking).  Volunteers did their best to convince the panel of which two characters should “Team-Up” or in other words identify the 2 statistically closest characters.

For participating all volunteers received a hero-clix figure of their choice.


I want to personally thank everyone who attended the panel in Tampa, at the Tampa Comic Convention.  I look forward to meeting again in 2020.


003_008

Uncategorized

TBCC 2019 The Pokemon Journey Panel

012


Welcome to the first recap of the Comic Con Data Science panels run by the crew at Pancake Analytics.  Before I dive into the recap of The Pokemon Journey panel held at the Tampa Bay Comic Convention 2019, I’d like to have a quick over view of why I’ve chosen this path.

One question I get asked often is where did I get the idea to apply the fundamentals of data science to comic, video games and all fanfare?

The answer is simple to me and is a core pillar of Pancake Analytics.  I want to teach, share, engage and learn from the comic con family.

I want to TEACH those who attend my panels or interact with this page an introduction to data science and how it can improve areas of your life you are passionate in.

I want to SHARE my years of analytics experience with aspiring analysts and those scared of statistics.

I want to ENGAGE with fans of comic, video games, anime, theme parks, all things geek! I’m one of you and love our conversations.

I want to LEARN your point of view of the topics I discuss.  How do we have a high level discussion about data that doesn’t feel like a math class?

If any these core pillars resonate with you, I hope you enjoy the content I produce and continue to join the discussions.


001


The Pokemon Journey at TBCC2019 was held on Saturday, August 3, 2019 at 7:30 PM – 8:30 PM.
The pitch of the panel was as follows:
Going to Tampa Bay Comic Con⁉️

Join us in the lite heart-ed data science discussion of Pokémon. Journey from Kanto to the Alola region through machine learning. This panel is more helpful than a Pokédex.

The Panelist were myself and Steve (an indie game developer).  Here’s a commissioned piece I got from a comic con artist:
014

002
Above is the a visual representation of the Pokemon Journey we are about to embark on.

The steps on our Pokémon Journey:

  • New Point Of View on Pokémon
  • Field Researchers & Learning from them
  • Pokémon Team Recommendations

During the new point of view on Pokémon section, I walked through the audience of a K-means clustering algorithm to reset Pokémon tiers and move us away from only grouping Pokémon together by typing.

During the Field Researchers & Learning from them section, I walked through the audience how to utilize survey data to build recommendation engine ( companies as large as Amazon and Netflix use this technique).

During the Pokémon Team Recommendations section, I walked through the audience the output of the recommendation model and real life scenarios of recommended teams.


 

003

A k-means cluster uncovers trends within our Pokémon data to understand the relational similarities and differences on key in game attributes.

The more clusters the clearer our picture becomes and the deeper we can understand the Pokémon throughout our journey.

When you pick up a Pokémon game for the first time ever you are in the left square.  Running this algorithm will get you the bottom right sooner, a clear picture.


004


 

A Brief overview of the approach:

Standardize your variables (bring your variables to a mean of zero)

Analyze your elbow curve

Validate your clusters

3 Distinct Groups:

High – Highest in all categories except for base defense and hp

Medium – Highest on defense, middle ground in everything else

Low – Only high on hp


015


005


What does this tell us about the starters?

The output of the k-means clusters can be used in to help determine your approach from the very beginning.

Reading the pyramid:

Easy path:

Greninja, Swampert, & Sceptile

Hard path:

Serperior, Meganium, Torterra, & Chesnaught


013


006


How do we implement this scoring?

I needed more data to implement this approach.

5 Questions:

What’s your ideal team of 6 Pokémon?

What year did you start playing Pokémon?

Do you play Pokémon GO?

How many Pokémon games have you played?

Do you play the Pokémon TCG?


007

This approach recommends a new squad of Pokémon to the field researcher!

Implementing the scoring: Trust The Process

propensity model is a statistical scorecard that is used to predict the behavior of your customer or prospect base. Propensity models are often used to identify those most likely to respond to an offer, or to focus retention activity on those most likely to churn.

This the whole Pokémon journey coming to a full circle.

The Pokémon Professor has done their own research and builds a model.

The field research team assist the Pokémon Professor with gathering new data.

The Pokémon Professor uses the model to assist the field research team.


Here’s the model at work, the input and recommendations:

008.png


009


010


011

During the my data science panels I like to reinforce the learning through a game and participants get a prize from my own personal collection.  For this specific panel participants received an unopened pack of Team Up from the Pokemon TCG, and a Pokemon EX TCG individual card.

Here’s an overview of the game:

5 Volunteers

On the screen will be 3 Pokémon

2 Characters are look-a-likes (statistically speaking)

Volunteers will do their best to convince the panel of which two characters are look-a-likes and who should be wonder traded

For participating volunteers receive a fabulous prize


I want to personally thank everyone who attended the panel in Tampa, at the Tampa Comic Convention.  I look forward to meeting again in 2020.


003_008

K-Means Clustering, Pokemon Go

Recipe 015: Pokemon Gen 3 K-means Clustering

logo

Take Charge of your Destiny!

In this data story I’ll be showing you how a self guided machine learning algorithm can select the best Pokemon squad for the Hoenn region.

At the end of this data story you’ll have

six Pokemon to look out for in Pokemon GO

, as well as understand why the Bagon Community Day was the best to date!


025


007_hoenn_region

As seen in the generation 2 games, the generation 3 games brought a wave of changes, especially the data structure.

Listed below are what I feel to be some of the major changes which effect the data of Hoenn region Pokemon.

Main Features added from Generation 2:
A complete overhaul of the Pokemon data structure:
Individual personality value
Abilities and Nature
The IV system went from 0-15 to 0-31
Damage such as Poison, Burn and Leach Seed (passive damage) are resolved at the end of the turn instead of immediately)
135 new Pokemon introduced
103 new moves were introduced
Weather can now be found on the field and activate at the start of a battle
Double Battles

009_pokemon_double_battles

I’d like to call out double battles, as one of the main ingredients in my Pokemon evaluation soup is : Experience Growth Rate.

Double battles allow for more and quicker experience.

In other words all Pokemon can gain more experience earlier on in the game.

021

If you recall when I looked at the data of the Johto Pokemon, we introduced to the very strong bugs.

Now in the Hoenn region we are introduced to weaker bugs.

This was done to counteract the impact of Heracross and Shuckle.

Catch these bugs below for the pokedex completion but you’re not going to have them on your main team.

022

So these weak bugs aside you do get one of (if not the most) powerful dragons: Salamence.  If you play Pokemon Go, you most likely took advantage of in my opinion the best Pokemon Go Community Day to date (Held on 4/13/2019).

008_community_day_01

One of my favorite sayings and motto is “Stay away from the brand names.”  What does this mean and how does it apply to Pokemon?  It means don’t buy into popular opinion, let the facts and data support your choices.

What’s all you hear about on community days?  If you screamed “shinys” then yes… that’s all you hear about.  How many shinys did you catch?

What’s your highest CP shiny?  I’ll trade for shinys.  Don’t be distracted by the brand name of community day, go for more than shinys.  Play in area with several poke stops and has cover from weather.

During the Bagon community day you should have been catching every Bagon spawn

, not only clicking in to see if it’s shiny.  Salamence is the goal, you want to be the mother of dragons (yes, I’m hype for Game of Thrones).

008_community_day


006_scatter_plot

Sticking to the theme of “Stay Away from Brand Names”, applying a k-means clustering algorithm will look for trends in the data and give us a group of Elite Pokemon we should replay Pokemon Ruby and Sapphire with and keep an eye out for in Pokemon Go.

How do we get to the ideal Pokemon team?  Applying a self guided machine learning approach: K-means clustering.  Now you can’t jump ahead and run the algorithm against your data.  First step is standardize your data, because you want to give each of your attributes an equal weight. 

Take for instance:

I want a well balanced team, I don’t want a team elite on attack but weak on defense.

After the data standardize and I run the k-means algorithm, you can see the scatter plot above.  The top right and far right cluster is the segment I want to build my team out of.  All other segments, you can win with but you can 100% steam roll the competition.

Below I’ve included visual representation of the top attackers and defenders in each cluster.


017


018


019


020


This is great, love info graphics… but what do we do this knowledge?  Well we can build a team.

Your team building begins from the very beginning.

I’ll cut to the chase… you should chose Torchic (sorry Swampert fans)

026

Why Torchic? Well I’m concerned about team structure and most importantly a showdown with Slaking (Fighting moves are must).  Below you can see the full recommendation of what your final team should look like.  You should also target all of these in Pokemon GO.

 

024


005

027


028