Welcome and whether you’re a long time follower or new to the page, I’m glad to have you here. First off let me get some updates out of the way and a little bit of refresher of what this site is all about.
My Pokemon data science YouTube Channel is progressing along nicely, the intro almost complete and the format for each video is almost nailed down.
A little background on me:
I’m a manger of decision sciences department and I want to give back to the comic con community. I’ve been attending comic cons since high school (I won’t carbon date myself here but let’s just say my favorite video game is Civ II).
I want to teach, share, engage and learn from the comic con family, as well as share my years of analytics experience with aspiring analysts or those just looking to improve and understand the data behind their fandom.
This post is a new flow and stray from what I typically do, the idea here is to end with a Pokemon team recommendation for those playing the shield version and those playing sword (personally I picked up sword, love it).
I’ll walk you through a machine learning approach to building an optimized Pokemon team, the focus here is getting through the main story (in a different post I’ll give recommendations for a competitive team).
Excluded from our selection will be legendary and mythical Pokemon, I exclude these because they are in a league of their own and it wouldn’t be informative if I told you to use those as a top tier.
I have the underlying data for the Pokemon in the Galar region:
Base attack, base defense, base HP, base special attack, base special defense, base speed, version exclusives, and location in the Galar region.
In this approach I will avoid recommending Pokemon which must be caught in the “wild” area, but the the “wild” area is utilized to level up your team. I highly recommend you use the raiding system in the wild to earn candy rewards and level up your Pokemon team quickly.
The final team recommendations will take inconsideration the moves they can learn, the gym leaders you face, as well as the champion at the end of the main story. I’ll exclude the post game (hilariously ridiculous post game by the way, as most of the game has this quirky charm to it, I love it).
If you’re curious and want to do your own research in the image above I’ll be performing a K-means clustering which falls under unsupervised machine learning.
A K-means clustering algorithm will group our Pokemon into clear tiers. Without running this algorithm we’re in the top left corner, and we can barely see our Yamper.
He’s a good boy, so we want to see him clearly and the more groups we choose the clearer we can see where Yamper ranks among all the Galar region Pokemon.
The above elbow curve helps in deciding how many tiers we should select. When choosing the amount of groups or clusters, you need take a few thing into consideration:
How many clusters can I reasonably speak to?
What will give me a clear distinction between my Pokemon?
How quickly can I validate the clusters? (Selecting 12 clusters, it would take a while for it pass the sniff test.)
This bubble chart is a good visual representation of our clusters and the Pokemon of the Galar region. A K-means cluster puts the focus on trends in the data, so all of those base stats I mentioned before our standardized.
Standardizing our variables means the mean of each variable is set to zero. This is helpful if ranges or your underlying data is very different. As it pertains to Pokemon this makes sure I have a good mix and I’m not only bucketing based on attackers and HP.
The above bubble chart has a lot of overlap in our clusters and how reasonable is it to bring a stage 1 Pokemon in your final team.
So I’ve got some data clean up to do, I’m going to limit my data to final stage Pokemon and I reduce the number of clusters. I like to keep it at 3, so I can talk about high, medium and low value.
Now this is a much easier to read bubble chart. Let me give a quick reference of what this chart tells us.
Cinderace is statistically more like Dragapult then it is like Coalossal, although they’re both fire types. I want to move you away from the typing discussion for now.
Here’s a high level view of our clusters and to keep on them instead of high, medium, and low I’ve given the cluster poke-ball teamed names.
Now for each base stat there is high/medium/low but this in comparison to the other clusters.
The ultra cluster at a glance:
61% of the Pokemon in the Galar region, they excel in HP and attack. They are not a handicap in any other categories, these should be the bulk of your team and are good to compliment if you are dipping into the other clusters.
The quick cluster at a glance:
11% of the Pokemon in the Galar region, they excel in defense and special defense. They are the weakest cluster in all other categories. I assigned them a quick ball because most likely you are catching these Pokemon to fill your pokedex in the post game.
A quick reminder about these clusters, is the focus here is the main story not competitive scene, sorry trick room fans.
The premier cluster at a glance:
28% of the Pokemon in the Galar region, they excel in Special attack and Speed. They struggle in defense and special defense. There’s a lot of gems in this cluster, but you’ll need to level them up, have a proper move-set and do the damage quickly. Don’t drag out the battle with one these Pokemon.
Now what you’re really here for, the team recommendation.
The outcome of this is to use the findings from the K-means cluster and recommend a team based on the game version you are playing (either sword or shield). All Pokemon must be catch-able beyond the “wild”. Finally a starter must be selected.
You’ll see two charts below: a sword version and shield version. Both will have you rounding out your final team before the fifth gym.
Thank you for reading through and feel free to reach out to me on instagram (@pancake_analytics) if you’d like the code that produces this, the data set, interested in an upcoming comic con panel, or just want to learn more about your fandom through data science.