Recipe 012: Pokemon Gen 2 K-means Clustering


Thanks for coming for a bite, let’s dig into some pancakes and the data science behind the Pokemon of the Johto Region.  How do they differ from the Kanto Region?  What’s the importance of introducing two new Pokemon Types?  Finally how speaking about the trends in our data will help us understand the relational differences and similarities beyond Pokemon general typing!




Pokemon Gold and Silver ushered in a new era for the Pokemon series and listed below are few changes (not listing all the influential game changes in this post) which still have a large influence through this day:

The introduction of Shinies (Shiny Gyarados shown above)

Gender types

Eggs, breeding and babies

The experience bar

Two new Pokemon types: Dark and Steel


I want to touch base on specifically two items in the above list and how they effect the overall re-balancing of the Pokemon universe (see above the increase of stronger bug type Pokemon) and how it’s driving difference between generation 1 and generation 2.

Eggs, breeding and babies

Two new Pokemon types: Dark and Steel

How do Eggs, breeding and babies influence the trends in our data?  For instance there’s more normal types added to the mix (+1%) but the average base attack (-8%) and base defense (-2%, even with the introduction of Blissey!)  have both declined versus generation 1 (Red and Blue).


How do the introduction of two new Pokemon types: Dark and Steel influence our trends?  For those of you have played gold and/or silver you know this is longest nameplate in the Pokemon series to date because you also travel back to the Kanto region (Where psychic and ice types reign supreme!).

Dark type Pokemon are super effective against Psychic and Ghost types.  They’re vulnerable to Fighting, Bug and Fairy types.

Steel type Pokemon are super effective against Rock, Ice, Fairy and Dragon types.  They’re vulnerable to Fighting, Ground, and Fire.

Bug type Pokemon are super effective against Psychic, Grass, and Dark.  They’re vulnerable to Flying, Rock and Fire.

Dark and Steel types where introduced to re-balance the game and give the player the tools to be prepared for the Kanto region challenges.  In doing new and stronger Bug type Pokemon (think Heracross and Shuckle) were introduced to add a check in place for those trainers who go on a full on attack against Psychic type Pokemon (Dark and Grass types [counters to Mewtwo]).

Now we’ve dug into the differences of our data from generation 2 to generation 1 we can begin focusing on generation 2 and how we can apply a guided machine learning to building the best Pokemon Johto team we can!


While training this model, I uncovered a segment full of only legendary Pokemon, although you can get these Pokemon in the game I will be removing them from this analysis, for a few reasons:

They’re overpowered compared to the rest of the population.

They’re meant as a reward.

It’s not very insight full to know the legendary dogs have more income with other legendary Pokemon as opposed to a baby Pokemon.

Let’s continue…


In my segmentation I’ll be throwing in several key performance indicators for Pokemon value throughout the game ranging from base attack to experience growth rate.  How do I get these vastly different attributes on the same scale?

Through standardization!  Standardizing my variables to a mean of zero will put a heavier weight on the trends within the data, as opposed the individual weight of each variable.


How do I determine the proper number of clusters?  I’ll analyze this elbow graph and look for an error where my sum of squares begins to bend (as an elbow would).

From first glance I begin to see the shift at 4 groups, then a slight change at 5 groups and vast difference at 6 groups.  What does this tell me? Possibly one of clusters has high deviations and variability on the attributes selected for clustering.


Understanding I might have a group with high variability and seeing there isn’t a large difference from 4 groups to 5 groups, I decide to plot a 4 cluster solution.

Visualizing our data in this way (plotting my the top two components [ which accounts for 60.33% of the variability in the data]) show me two things:

The relationships between Pokemon beyond general type.

My group to the far right, if I ran a 6 cluster solution would have large overlap and possibly a smaller cluster smack in the middle of it.

Now that we’ve done this let’s learn about the Johto Pokemon…


My top tiered Pokemon group is a clustering of elite scored attributes, which explains the variability.  Above you can see the type breakdown and the top base attack and top base defense Pokemon within the group.  I like this display because it puts the emphasis on how introducing Dark, Steel, and more stronger bugs have influenced the Pokemon universe.  During a previous analysis (Which can be found in the kitchen!) I did the same approach for the Kanto Pokemon and Psychic types were the top attackers.


The next tier is the Valuable tier, Pokemon fall in this tier because they are borderline elite in one attribute but overall well balanced.  Think of these Pokemon as the Jack of All Trades.


The Medium value tier has more variability on Pokemon type, and are Pokemon which evolve in most cases (all three starters fall in this group) but not all (see Dunsparce).  Pokemon in this tier if left as is and never evolve…. will never migrate to the upper tiers.


All Pokemon have value when trained to their full potential and this is why my bottom tier is called Low Value.  Pokemon in this tier will take time and patience but do offer unique attribute scores which can be useful at higher levels.  As seen above Granbull’s family tree begins in this tier.  There’s an opportunity to migrate from the Low value tier to the Valuable tier if you train, train, train!!!


Now that we’ve gone through this exercise what unique findings can we come up with?  Possibly something you didn’t already know.

Shuckle has more in common with Tyranitar than Miltank.

Shuckle’s unique combination of Elite base defense and hp, out weighs it’s lower scored attacks, to take it’s place among the Pokemon powerhouses of the Johto region.

Thank you for reading this data story and if you have follow-ups or would like to continue the discussion direct message me on Instagram @pancake_analytics !

Enjoy your breakfast!





Leave a Reply

%d bloggers like this: