Uncategorized

Pancake Analytics Galar Guide

001.png


Welcome and whether you’re a long time follower or new to the page, I’m glad to have you here.  First off let me get some updates out of the way and a little bit of refresher of what this site is all about.

My Pokemon data science YouTube Channel is progressing along nicely, the intro almost complete and the format for each video is almost nailed down.

A little background on me:

I’m a manger of decision sciences department and I want to give back to the comic con community.  I’ve been attending comic cons since high school (I won’t carbon date myself here but let’s just say my favorite video game is Civ II).

I want to teach, share, engage and learn from the comic con family, as well as share my years of analytics experience with aspiring analysts or those just looking to improve and understand the data behind their fandom.

This post is a new flow and stray from what I typically do, the idea here is to end with a Pokemon team recommendation for those playing the shield version and those playing sword (personally I picked up sword, love it).


002

I’ll walk you through a machine learning approach to building an optimized Pokemon team, the focus here is getting through the main story (in a different post I’ll give recommendations for a competitive team).

Excluded from our selection will be legendary and mythical Pokemon, I exclude these because they are in a league of their own and it wouldn’t be informative if I told you to use those as a top tier.

I have the underlying data for the Pokemon in the Galar region:

Base attack, base defense, base HP, base special attack, base special defense, base speed, version exclusives, and location in the Galar region.

In this approach I will avoid recommending Pokemon which must be caught in the “wild” area, but the the “wild” area is utilized to level up your team.  I highly recommend you use the raiding system in the wild to earn candy rewards and level up your Pokemon team quickly.

The final team recommendations will take inconsideration the moves they can learn, the gym leaders you face, as well as the champion at the end of the main story.  I’ll exclude the post game (hilariously ridiculous post game by the way, as most of the game has this quirky charm to it, I love it).

If you’re curious and want to do your own research in the image above I’ll be performing a K-means clustering which falls under unsupervised machine learning.


003

A K-means clustering algorithm will group our Pokemon into clear tiers.  Without running this algorithm we’re in the top left corner, and we can barely see our Yamper.

He’s a good boy, so we want to see him clearly and the more groups we choose the clearer we can see where Yamper ranks among all the Galar region Pokemon.

004

The above elbow curve helps in deciding how many tiers we should select.  When choosing the amount of groups or clusters, you need take a few thing into consideration:

How many clusters can I reasonably speak to?

What will give me a clear distinction between my Pokemon?

How quickly can I validate the clusters? (Selecting 12 clusters, it would take a while for it pass the sniff test.)


005

This bubble chart is a good visual representation of our clusters and the Pokemon of the Galar region.  A K-means cluster puts the focus on trends in the data, so all of those base stats I mentioned before our standardized.

Standardizing our variables means the mean of each variable is set to zero.  This is helpful if ranges or your underlying data is very different.  As it pertains to Pokemon this makes sure I have a good mix and I’m not only bucketing based on attackers and HP.

The above bubble chart has a lot of overlap in our clusters and how reasonable is it to bring a stage 1 Pokemon in your final team.

So I’ve got some data clean up to do, I’m going to limit my data to final stage Pokemon and I reduce the number of clusters.  I like to keep it at 3, so I can talk about high, medium and low value.

006

Now this is a much easier to read bubble chart.  Let me give a quick reference of what this chart tells us.

Cinderace is statistically more like Dragapult then it is like Coalossal, although they’re both fire types.  I want to move you away from the typing discussion for now.


007

Here’s a high level view of our clusters and to keep on them instead of high, medium, and low I’ve given the cluster poke-ball teamed names.

Now for each base stat there is high/medium/low but this in comparison to the other clusters.


008

The ultra cluster at a glance:

61% of the Pokemon in the Galar region, they excel in HP and attack.  They are not a handicap in any other categories, these should be the bulk of your team and are good to compliment if you are dipping into the other clusters.


009

The quick cluster at a glance:

11% of the Pokemon in the Galar region, they excel in defense and special defense.  They are the weakest cluster in all other categories.  I assigned them a quick ball because most likely you are catching these Pokemon to fill your pokedex in the post game.

A quick reminder about these clusters, is the focus here is the main story not competitive scene, sorry trick room fans.


010

The premier cluster at a glance:

28% of the Pokemon in the Galar region, they excel in Special attack and Speed.  They struggle in defense and special defense.  There’s a lot of gems in this cluster, but you’ll need to level them up, have a proper move-set and do the damage quickly.  Don’t drag out the battle with one these Pokemon.


Now what you’re really here for, the team recommendation.

The outcome of this is to use the findings from the K-means cluster and recommend a team based on the game version you are playing (either sword or shield).  All Pokemon must be catch-able beyond the “wild”.  Finally a starter must be selected.

You’ll see two charts below: a sword version and shield version.  Both will have you rounding out your final team before the fifth gym.


011.png


012


Thank you for reading through and feel free to reach out to me on instagram (@pancake_analytics) if you’d like the code that produces this, the data set, interested in an upcoming comic con panel, or just want to learn more about your fandom through data science.


028

Uncategorized

AFO 2019 Player One, Power Ups, & Probabilities: Panel Recap

012


001


logo


Before I share the entire Anime Festival Orlando (AFO) 2019 Panel I’d like to give some insight on the nerves I had going into this panel and how the audience helped me get into the groove.

This panel I opted to go solo on, normally I have guest panelists join me, so the nerves where at all time high.

Could I keep the entire room engaged for a data science panel?  Would the flow drastically change?

I was set up ready to go early, and had great discussions with those who sat in early, we discussed whether or not to get pick-up Let’s GO Pikachu/Eevee.  Even one of the attendees were referred to attend panel from their friends who attended my Tampa Comic Convention Panels!

This was a first and good gut check for me, that what I’m trying to accomplish with Pancake Analytics is a good thing and is going over well.

I can’t thank the community we’re building here together enough!


028


This panel was held on: Saturday, August 10, 2019 at 8:30 PM – 9:30 PM

In Orlando, Fl during AFO 2019.


Our journey begins…

002

The steps on our Pokémon Journey:

  • New Point Of View on Pokémon
  • Field Researchers & Learning from them
  • Pokémon Team Recommendations

A New Point of View on Pokémon : Overview

003

A k-means cluster uncovers trends within our Pokémon data to understand the relational similarities and differences on key in game attributes.

The more clusters the clearer our picture becomes and the deeper we can understand the Pokémon throughout our journey.


A New Point of View on Pokémon : The Results

004

A Brief overview of the approach:

Standardize your variables (Set each variable to mean of zero)

Analyze your elbow curve (Look for when the line plot elbows)

Validate your clusters (Perform a uni-variate analysis on core kpis for each cluster)

3 Distinct Groups:

High – Highest in all categories except for base defense and hp

Medium – Highest on defense, middle ground in everything else

Low – Only high on hp


What does this tell us about the starters?

005

The output of the k-means clusters can be used in to help determine your approach from the very beginning.

Reading the pyramid:

Easy path: (Build you team around this Pokemon & steamroll grind the competition)

Greninja, Swampert, & Sceptile

Hard path: (Need to acquire complimentary Pokemon, you learn more about Pokemon this way)

Serperior, Meganium, Torterra, & Chesnaught


How do we implement this scoring?

006

I needed more data to implement this approach.

I reached out to my instagram followers with a survey, and volunteers we’re given:

5 Questions:

What’s your ideal team of 6 Pokémon?

What year did you start playing Pokémon?

Do you play Pokémon GO?

How many Pokémon games have you played?

Do you play the Pokémon TCG?


Implementing the scoring: Trust The Process

007

propensity model is a statistical scorecard that is used to predict the behavior of your customer or prospect base. Propensity models are often used to identify those most likely to respond to an offer, or to focus retention activity on those most likely to churn.

I used this model to predict if a Pokemon would be selected in the survey and used these results to recommend Pokemon a survey participants didn’t select but would give them statistically the same results of playing.

This is the whole Pokémon journey coming to a full circle.

The Pokémon Professor has done their own research and builds a model.

The field research team assists the Pokémon Professor with gathering new data.

The Pokémon Professor uses the model to assist the field research team.


Here are results of my recommendation model:

008.png

009

010


029

Is Ash getting better with each season?

I’ve analyzed all of Ash’s teams throughout the anime (from Kanto through XYZ).  I want to answer the question… Is Ash getting better with each season?

First challenge was how do we define success and what data science methodology do we use?

One area I feel gets over looked in data science is the performance analytics realm, using univariate and multivariate statistical analysis.

Univariate and multivariate represent two approaches to statistical analysis. Univariate involves the analysis of a single variable while multivariate analysis examines two or more variables. Most multivariate analysis involves a dependent variable and multiple independent variables

011.png

How do we determine success?

Base stats seem like a good starting point.

But as you can see one Pokémon can throw off our data… cough…  cough … Greninja cough … cough


Here’s how we do it, use the Pokémon GO Approach

012

As much as I feel Pokémon GO has flaws which shouldn’t get a pass, their CP attribute holds the answer to standardizing and scaling Ash’s teams.

What is CP in Pokémon Go?

CP (combat power) is not related to how much damage a Pokémon deals when attacking gyms, but is a combination of attack, defense and stamina (HP)

013

014

Using this approach helps level the field for those teams where Ash was heavy in one attribute, or when he only had one strong Pokemon.


From beginning to end Ash increased his CP by 8%

ash-hat-pikachu-169

015

His best rotation was in Sinnoh

  • He evolved the most Pokémon compared to his other teams.
  • He evolved 3 Pokémon all the way to their final evolution.
  • 3 of his Pokémon fall into our High cluster.

016

His worst rotation was in Johto

  • He evolved only one Pokémon (Notctowl he found).
  • He attempted to build a similar team he had in Kanto.
  • Only 1 of his Pokémon fall into our High cluster.

Game Time: Let’s GO! Wonder Trade: Overview

I personally feel one of the best ways to reinforce learning is through a game.  During all of my panels I like to play a game that reinforces a machine learning technique, in this case the propensity model.

Those who participated received a rare Pokemon TCG EX/GX individual card, a unified minds unopened TCG booster pack, and a gift certificate to Burger King ( a meal on me ).

Food is usually hard to come by at a convention, so I went back to my younger roots, and thought well I would have loved to get a free meal at a convention.

017.png

5 Volunteers

On the screen will be 3 of Ash’s Pokémon

2 Pokémon are look-a-likes (statistically speaking)

Volunteers will do their best to convince the me of which two Pokémon are look-a-likes and who should be wonder traded

For participating volunteers receive a fabulous prize



028

Uncategorized

TBCC 2019 The Pokemon Journey Panel

012


Welcome to the first recap of the Comic Con Data Science panels run by the crew at Pancake Analytics.  Before I dive into the recap of The Pokemon Journey panel held at the Tampa Bay Comic Convention 2019, I’d like to have a quick over view of why I’ve chosen this path.

One question I get asked often is where did I get the idea to apply the fundamentals of data science to comic, video games and all fanfare?

The answer is simple to me and is a core pillar of Pancake Analytics.  I want to teach, share, engage and learn from the comic con family.

I want to TEACH those who attend my panels or interact with this page an introduction to data science and how it can improve areas of your life you are passionate in.

I want to SHARE my years of analytics experience with aspiring analysts and those scared of statistics.

I want to ENGAGE with fans of comic, video games, anime, theme parks, all things geek! I’m one of you and love our conversations.

I want to LEARN your point of view of the topics I discuss.  How do we have a high level discussion about data that doesn’t feel like a math class?

If any these core pillars resonate with you, I hope you enjoy the content I produce and continue to join the discussions.


001


The Pokemon Journey at TBCC2019 was held on Saturday, August 3, 2019 at 7:30 PM – 8:30 PM.
The pitch of the panel was as follows:
Going to Tampa Bay Comic Con⁉️

Join us in the lite heart-ed data science discussion of Pokémon. Journey from Kanto to the Alola region through machine learning. This panel is more helpful than a Pokédex.

The Panelist were myself and Steve (an indie game developer).  Here’s a commissioned piece I got from a comic con artist:
014

002
Above is the a visual representation of the Pokemon Journey we are about to embark on.

The steps on our Pokémon Journey:

  • New Point Of View on Pokémon
  • Field Researchers & Learning from them
  • Pokémon Team Recommendations

During the new point of view on Pokémon section, I walked through the audience of a K-means clustering algorithm to reset Pokémon tiers and move us away from only grouping Pokémon together by typing.

During the Field Researchers & Learning from them section, I walked through the audience how to utilize survey data to build recommendation engine ( companies as large as Amazon and Netflix use this technique).

During the Pokémon Team Recommendations section, I walked through the audience the output of the recommendation model and real life scenarios of recommended teams.


 

003

A k-means cluster uncovers trends within our Pokémon data to understand the relational similarities and differences on key in game attributes.

The more clusters the clearer our picture becomes and the deeper we can understand the Pokémon throughout our journey.

When you pick up a Pokémon game for the first time ever you are in the left square.  Running this algorithm will get you the bottom right sooner, a clear picture.


004


 

A Brief overview of the approach:

Standardize your variables (bring your variables to a mean of zero)

Analyze your elbow curve

Validate your clusters

3 Distinct Groups:

High – Highest in all categories except for base defense and hp

Medium – Highest on defense, middle ground in everything else

Low – Only high on hp


015


005


What does this tell us about the starters?

The output of the k-means clusters can be used in to help determine your approach from the very beginning.

Reading the pyramid:

Easy path:

Greninja, Swampert, & Sceptile

Hard path:

Serperior, Meganium, Torterra, & Chesnaught


013


006


How do we implement this scoring?

I needed more data to implement this approach.

5 Questions:

What’s your ideal team of 6 Pokémon?

What year did you start playing Pokémon?

Do you play Pokémon GO?

How many Pokémon games have you played?

Do you play the Pokémon TCG?


007

This approach recommends a new squad of Pokémon to the field researcher!

Implementing the scoring: Trust The Process

propensity model is a statistical scorecard that is used to predict the behavior of your customer or prospect base. Propensity models are often used to identify those most likely to respond to an offer, or to focus retention activity on those most likely to churn.

This the whole Pokémon journey coming to a full circle.

The Pokémon Professor has done their own research and builds a model.

The field research team assist the Pokémon Professor with gathering new data.

The Pokémon Professor uses the model to assist the field research team.


Here’s the model at work, the input and recommendations:

008.png


009


010


011

During the my data science panels I like to reinforce the learning through a game and participants get a prize from my own personal collection.  For this specific panel participants received an unopened pack of Team Up from the Pokemon TCG, and a Pokemon EX TCG individual card.

Here’s an overview of the game:

5 Volunteers

On the screen will be 3 Pokémon

2 Characters are look-a-likes (statistically speaking)

Volunteers will do their best to convince the panel of which two characters are look-a-likes and who should be wonder traded

For participating volunteers receive a fabulous prize


I want to personally thank everyone who attended the panel in Tampa, at the Tampa Comic Convention.  I look forward to meeting again in 2020.


003_008

Uncategorized

Recipe 012: Pokemon Gen 2 K-means Clustering

logo

Thanks for coming for a bite, let’s dig into some pancakes and the data science behind the Pokemon of the Johto Region.  How do they differ from the Kanto Region?  What’s the importance of introducing two new Pokemon Types?  Finally how speaking about the trends in our data will help us understand the relational differences and similarities beyond Pokemon general typing!


012_receipe


 

gyarados_en_265x240

Pokemon Gold and Silver ushered in a new era for the Pokemon series and listed below are few changes (not listing all the influential game changes in this post) which still have a large influence through this day:

The introduction of Shinies (Shiny Gyarados shown above)

Gender types

Eggs, breeding and babies

The experience bar

Two new Pokemon types: Dark and Steel

increase_in_bugs

I want to touch base on specifically two items in the above list and how they effect the overall re-balancing of the Pokemon universe (see above the increase of stronger bug type Pokemon) and how it’s driving difference between generation 1 and generation 2.

Eggs, breeding and babies

Two new Pokemon types: Dark and Steel

How do Eggs, breeding and babies influence the trends in our data?  For instance there’s more normal types added to the mix (+1%) but the average base attack (-8%) and base defense (-2%, even with the introduction of Blissey!)  have both declined versus generation 1 (Red and Blue).

rebalancing

How do the introduction of two new Pokemon types: Dark and Steel influence our trends?  For those of you have played gold and/or silver you know this is longest nameplate in the Pokemon series to date because you also travel back to the Kanto region (Where psychic and ice types reign supreme!).

Dark type Pokemon are super effective against Psychic and Ghost types.  They’re vulnerable to Fighting, Bug and Fairy types.

Steel type Pokemon are super effective against Rock, Ice, Fairy and Dragon types.  They’re vulnerable to Fighting, Ground, and Fire.

Bug type Pokemon are super effective against Psychic, Grass, and Dark.  They’re vulnerable to Flying, Rock and Fire.

Dark and Steel types where introduced to re-balance the game and give the player the tools to be prepared for the Kanto region challenges.  In doing new and stronger Bug type Pokemon (think Heracross and Shuckle) were introduced to add a check in place for those trainers who go on a full on attack against Psychic type Pokemon (Dark and Grass types [counters to Mewtwo]).

Now we’ve dug into the differences of our data from generation 2 to generation 1 we can begin focusing on generation 2 and how we can apply a guided machine learning to building the best Pokemon Johto team we can!


011_remove_outliers

While training this model, I uncovered a segment full of only legendary Pokemon, although you can get these Pokemon in the game I will be removing them from this analysis, for a few reasons:

They’re overpowered compared to the rest of the population.

They’re meant as a reward.

It’s not very insight full to know the legendary dogs have more income with other legendary Pokemon as opposed to a baby Pokemon.

Let’s continue…


010_standardize_vars

In my segmentation I’ll be throwing in several key performance indicators for Pokemon value throughout the game ranging from base attack to experience growth rate.  How do I get these vastly different attributes on the same scale?

Through standardization!  Standardizing my variables to a mean of zero will put a heavier weight on the trends within the data, as opposed the individual weight of each variable.

002_amount_of_clusters_plot

How do I determine the proper number of clusters?  I’ll analyze this elbow graph and look for an error where my sum of squares begins to bend (as an elbow would).

From first glance I begin to see the shift at 4 groups, then a slight change at 5 groups and vast difference at 6 groups.  What does this tell me? Possibly one of clusters has high deviations and variability on the attributes selected for clustering.


001_comp_plot

Understanding I might have a group with high variability and seeing there isn’t a large difference from 4 groups to 5 groups, I decide to plot a 4 cluster solution.

Visualizing our data in this way (plotting my the top two components [ which accounts for 60.33% of the variability in the data]) show me two things:

The relationships between Pokemon beyond general type.

My group to the far right, if I ran a 6 cluster solution would have large overlap and possibly a smaller cluster smack in the middle of it.

Now that we’ve done this let’s learn about the Johto Pokemon…


003_elite_info

My top tiered Pokemon group is a clustering of elite scored attributes, which explains the variability.  Above you can see the type breakdown and the top base attack and top base defense Pokemon within the group.  I like this display because it puts the emphasis on how introducing Dark, Steel, and more stronger bugs have influenced the Pokemon universe.  During a previous analysis (Which can be found in the kitchen!) I did the same approach for the Kanto Pokemon and Psychic types were the top attackers.

004_valuable_info

The next tier is the Valuable tier, Pokemon fall in this tier because they are borderline elite in one attribute but overall well balanced.  Think of these Pokemon as the Jack of All Trades.

005_medium_info

The Medium value tier has more variability on Pokemon type, and are Pokemon which evolve in most cases (all three starters fall in this group) but not all (see Dunsparce).  Pokemon in this tier if left as is and never evolve…. will never migrate to the upper tiers.

006_low_info

All Pokemon have value when trained to their full potential and this is why my bottom tier is called Low Value.  Pokemon in this tier will take time and patience but do offer unique attribute scores which can be useful at higher levels.  As seen above Granbull’s family tree begins in this tier.  There’s an opportunity to migrate from the Low value tier to the Valuable tier if you train, train, train!!!


009_shuckle_003

Now that we’ve gone through this exercise what unique findings can we come up with?  Possibly something you didn’t already know.

Shuckle has more in common with Tyranitar than Miltank.

Shuckle’s unique combination of Elite base defense and hp, out weighs it’s lower scored attacks, to take it’s place among the Pokemon powerhouses of the Johto region.

Thank you for reading this data story and if you have follow-ups or would like to continue the discussion direct message me on Instagram @pancake_analytics !

Enjoy your breakfast!


005

panackes_yum


 

003_008