







Pancake Breakfast: Stacks of Stats
Serving Up Analytics Fandom With A Side Of Bacon
Hi everyone! It’s Tom from pancake analytics and it’s been awhile but I proud to bring you a fresh stack of of analysis. Today’s analysis is on the Pokémon Trading Card Market, the recent boom, it’s decline and how we use the historic sales data to evaluate sets and cards we love to collect moving forward.
The boom of October 2020 was actually hinted at in January 2019 by forbes.com in their article titled “Trading Cards Continue to Trounce the S&P 5– As alternative Investments”. A line from this particular article which I find eye opening is:
“If a decade ago you had put your money in vintage and modern trading cards instead of the stock market, your payoff would be more than twice as big.”
A few more eye opening statistics from this article, the difference between the PWCC Top 500 Index 10 Year ROI and the S&P 10 Year ROI is 94%. That’s where our twice as big takeaway comes.
What is the reason behind this? Well maybe people have a deeper emotional connection to Charizard than they do to Amazon shares.
I’ve sourced Pokemon graded cards data from a site I recommend everyone visit: Pokemonprice.com.
Using this data and trending it over time here’s what I’m able to conclude from this analysis:
Regardless of the “Boom”, vintage cards grow. Let’s analyze the base set 1st edition non holo sales to explain this statement.
The average monthly two-growth of this set is 78% (excluding the boom data) and this is on par with last year’s two-year growth (2019 vs 2017) of 80%.
Pokémon Cards have shown a steady growth over time inline with the rest of the trading card industry. Although the monthly increase has declined by 7%, this speaks more to purchasing power of the market. Not everyone can afford to purchase a thousand-dollar card.
How much did the boom affect the sales? Looking at the total sales over time the boom begins in April 2020 there is a 98% increase in the total sold value of this set. Previously there was an increase of 45% in 2017, and 22% in 2018. In other words it doubled the growth, in a short period of time, this was never going to be sustainable.
We can learn and project sales from our data. The approach I’ll use is a holt-winters time series model. This approach works particularly well with sales data and especially if there’s a seasonal aspect ( i.e. high sales volumes over Christmas ).
A holt-winters uses three factors, a trend, a typical value and a cyclical repeating pattern. A trend is a slope over time, think what is our monthly change in sales? The typical value is used to add a value to recent sales, in this case I’ll use the average sale. Finally for the cyclical repeating pattern, which is a drawn out way to say seasonality, I’ll use our monthly sales figures.
We have our forecast built and it works, some might say it’s super effective, statistically it’s excellent. Mean Average Percent Error (also known as MAPE) will be our definition of success for this model, which stands at 8% . Anything below 10% is considered an excellent forecast and this is out of sample.
What I mean when I say out of sample, is I used historic data to predict values I already know occurred. This is best practice when building a forecast, please train and test your models.
The forecast has an excellent MAPE even with the boom occurring. The orange line is our forecast, and the blue line is the actual sales. I’m not confident the boom will be sustainable. You can see our forecast predicts sales which more align with the yearly growth I’ve mentioned previously. ( See the images below for the forecast lines )
The boom won’t last, but the two-year growth continues, this is promising for long term investing in Pokémon trading cards.
I’ll show two scenarios, the first scenario being the most unlikely which is the “Boom” is the new norm.
If the “Boom” is sustainable, the average growth in sales per card year over year, 2021 vs 2020, is more than double, 2.25 times more.
This average growth is $251 usd, which is roughly the equivalent of two modern booster boxes at retail price.
Now for the most likely scenario, the “Boom” ends.
If the “Boom” ends in early 2021 the market will be down year over year by 38%. There’s a bright side, this would still be an 84% growth compared to two year ago (2021 vs 2019). The average two growth would increase by 6%, still outpacing the stock market ( as called out by forbes.com in 2019 ).
Our forecast can be used to evaluate individual sets, this is were the fun of collecting what you love meets data science.
To show this application I’ve evaluated 3 sets which I feel are different from each other (variants, anime characters, etc.):
Jungle 1st Edition Holos
Gym Heroes Unlimted
Neo Genesis 1st Edition
Using the model I built and applying it to the individual sets here’s my evaluation and reccomendation:
Jungle 1st edition holos will show a growth of around 2.6x, that’s an increase of roughly $12K for the entire set value, by the end of December 2021.
Gym Heroes Unlimited non holos will show a growth of around 1.5x, that’s an increase of roughly $1.3K for the entire set value, by the end of December 2021.
Neo Genesis 1st Edition non holos will show a growth of around 2.8x, that’s an increase of roughly $39K for the entire set value, by the end of December 2021.
First glance you would think to yourself well, let invest in the Neo Genesis 1st Edition cards and forget the rest. I wouldn’t agree with you 100% if that’s what you want to do, I agree with you 100% if the first game you played in the series was gold/silver. Reason being is you have an emotional attachment to the Pokémon in that game, you are progressing your career/personal life to be able to invest in trading cards, so there will be more individuals like you, the market is there.
I will say don’t collect something you are not a subject material expert on and don’t collect something that the market isn’t there.
When I see these three sets compared to each other, I actually prefer Gym Heroes as an investment. The growth is there, you can argue it’s more sustainable because it’s only a $1K increase, and I personally feel the emotional attachment to gym leaders goes beyond the main series of video games, it leaks into the anime, cosplay at comic cons, and even some of the mobile apps (think Pokémon masters). Also from an ROI standpoint, you can acquire a lot of great cards from this set right now for a very low investment.
Long term investments in Pokémon Makes Sense. Trading cards as whole have outperformed the stock market over the past 10 years. The growth began well before the “Boom”, you can see this begin in 2018, and stay through 2019. The “Boom” won’t last, it’s currently doubling the price of cards. Let’s assume the “Boom” ends early 2021, the two-year growth still increases.
If you have time to invest, and can wait for a 10 year return, I’d recommend you do.
Welcome and whether you’re a long time follower or new to the page, I’m glad to have you here. First off let me get some updates out of the way and a little bit of refresher of what this site is all about.
My Pokemon data science YouTube Channel is progressing along nicely, the intro almost complete and the format for each video is almost nailed down.
A little background on me:
I’m a manger of decision sciences department and I want to give back to the comic con community. I’ve been attending comic cons since high school (I won’t carbon date myself here but let’s just say my favorite video game is Civ II).
I want to teach, share, engage and learn from the comic con family, as well as share my years of analytics experience with aspiring analysts or those just looking to improve and understand the data behind their fandom.
This post is a new flow and stray from what I typically do, the idea here is to end with a Pokemon team recommendation for those playing the shield version and those playing sword (personally I picked up sword, love it).
I’ll walk you through a machine learning approach to building an optimized Pokemon team, the focus here is getting through the main story (in a different post I’ll give recommendations for a competitive team).
Excluded from our selection will be legendary and mythical Pokemon, I exclude these because they are in a league of their own and it wouldn’t be informative if I told you to use those as a top tier.
I have the underlying data for the Pokemon in the Galar region:
Base attack, base defense, base HP, base special attack, base special defense, base speed, version exclusives, and location in the Galar region.
In this approach I will avoid recommending Pokemon which must be caught in the “wild” area, but the the “wild” area is utilized to level up your team. I highly recommend you use the raiding system in the wild to earn candy rewards and level up your Pokemon team quickly.
The final team recommendations will take inconsideration the moves they can learn, the gym leaders you face, as well as the champion at the end of the main story. I’ll exclude the post game (hilariously ridiculous post game by the way, as most of the game has this quirky charm to it, I love it).
If you’re curious and want to do your own research in the image above I’ll be performing a K-means clustering which falls under unsupervised machine learning.
A K-means clustering algorithm will group our Pokemon into clear tiers. Without running this algorithm we’re in the top left corner, and we can barely see our Yamper.
He’s a good boy, so we want to see him clearly and the more groups we choose the clearer we can see where Yamper ranks among all the Galar region Pokemon.
The above elbow curve helps in deciding how many tiers we should select. When choosing the amount of groups or clusters, you need take a few thing into consideration:
How many clusters can I reasonably speak to?
What will give me a clear distinction between my Pokemon?
How quickly can I validate the clusters? (Selecting 12 clusters, it would take a while for it pass the sniff test.)
This bubble chart is a good visual representation of our clusters and the Pokemon of the Galar region. A K-means cluster puts the focus on trends in the data, so all of those base stats I mentioned before our standardized.
Standardizing our variables means the mean of each variable is set to zero. This is helpful if ranges or your underlying data is very different. As it pertains to Pokemon this makes sure I have a good mix and I’m not only bucketing based on attackers and HP.
The above bubble chart has a lot of overlap in our clusters and how reasonable is it to bring a stage 1 Pokemon in your final team.
So I’ve got some data clean up to do, I’m going to limit my data to final stage Pokemon and I reduce the number of clusters. I like to keep it at 3, so I can talk about high, medium and low value.
Now this is a much easier to read bubble chart. Let me give a quick reference of what this chart tells us.
Cinderace is statistically more like Dragapult then it is like Coalossal, although they’re both fire types. I want to move you away from the typing discussion for now.
Here’s a high level view of our clusters and to keep on them instead of high, medium, and low I’ve given the cluster poke-ball teamed names.
Now for each base stat there is high/medium/low but this in comparison to the other clusters.
The ultra cluster at a glance:
61% of the Pokemon in the Galar region, they excel in HP and attack. They are not a handicap in any other categories, these should be the bulk of your team and are good to compliment if you are dipping into the other clusters.
The quick cluster at a glance:
11% of the Pokemon in the Galar region, they excel in defense and special defense. They are the weakest cluster in all other categories. I assigned them a quick ball because most likely you are catching these Pokemon to fill your pokedex in the post game.
A quick reminder about these clusters, is the focus here is the main story not competitive scene, sorry trick room fans.
The premier cluster at a glance:
28% of the Pokemon in the Galar region, they excel in Special attack and Speed. They struggle in defense and special defense. There’s a lot of gems in this cluster, but you’ll need to level them up, have a proper move-set and do the damage quickly. Don’t drag out the battle with one these Pokemon.
Now what you’re really here for, the team recommendation.
The outcome of this is to use the findings from the K-means cluster and recommend a team based on the game version you are playing (either sword or shield). All Pokemon must be catch-able beyond the “wild”. Finally a starter must be selected.
You’ll see two charts below: a sword version and shield version. Both will have you rounding out your final team before the fifth gym.
Thank you for reading through and feel free to reach out to me on instagram (@pancake_analytics) if you’d like the code that produces this, the data set, interested in an upcoming comic con panel, or just want to learn more about your fandom through data science.
Before I share the entire Anime Festival Orlando (AFO) 2019 Panel I’d like to give some insight on the nerves I had going into this panel and how the audience helped me get into the groove.
This panel I opted to go solo on, normally I have guest panelists join me, so the nerves where at all time high.
Could I keep the entire room engaged for a data science panel? Would the flow drastically change?
I was set up ready to go early, and had great discussions with those who sat in early, we discussed whether or not to get pick-up Let’s GO Pikachu/Eevee. Even one of the attendees were referred to attend panel from their friends who attended my Tampa Comic Convention Panels!
This was a first and good gut check for me, that what I’m trying to accomplish with Pancake Analytics is a good thing and is going over well.
I can’t thank the community we’re building here together enough!
This panel was held on: Saturday, August 10, 2019 at 8:30 PM – 9:30 PM
In Orlando, Fl during AFO 2019.
The steps on our Pokémon Journey:
A k-means cluster uncovers trends within our Pokémon data to understand the relational similarities and differences on key in game attributes.
The more clusters the clearer our picture becomes and the deeper we can understand the Pokémon throughout our journey.
A Brief overview of the approach:
Standardize your variables (Set each variable to mean of zero)
Analyze your elbow curve (Look for when the line plot elbows)
Validate your clusters (Perform a uni-variate analysis on core kpis for each cluster)
3 Distinct Groups:
High – Highest in all categories except for base defense and hp
Medium – Highest on defense, middle ground in everything else
Low – Only high on hp
The output of the k-means clusters can be used in to help determine your approach from the very beginning.
Reading the pyramid:
Easy path: (Build you team around this Pokemon & steamroll grind the competition)
Greninja, Swampert, & Sceptile
Hard path: (Need to acquire complimentary Pokemon, you learn more about Pokemon this way)
Serperior, Meganium, Torterra, & Chesnaught
I needed more data to implement this approach.
I reached out to my instagram followers with a survey, and volunteers we’re given:
5 Questions:
What’s your ideal team of 6 Pokémon?
What year did you start playing Pokémon?
Do you play Pokémon GO?
How many Pokémon games have you played?
Do you play the Pokémon TCG?
A propensity model is a statistical scorecard that is used to predict the behavior of your customer or prospect base. Propensity models are often used to identify those most likely to respond to an offer, or to focus retention activity on those most likely to churn.
I used this model to predict if a Pokemon would be selected in the survey and used these results to recommend Pokemon a survey participants didn’t select but would give them statistically the same results of playing.
This is the whole Pokémon journey coming to a full circle.
The Pokémon Professor has done their own research and builds a model.
The field research team assists the Pokémon Professor with gathering new data.
The Pokémon Professor uses the model to assist the field research team.
Here are results of my recommendation model:
I’ve analyzed all of Ash’s teams throughout the anime (from Kanto through XYZ). I want to answer the question… Is Ash getting better with each season?
First challenge was how do we define success and what data science methodology do we use?
One area I feel gets over looked in data science is the performance analytics realm, using univariate and multivariate statistical analysis.
Univariate and multivariate represent two approaches to statistical analysis. Univariate involves the analysis of a single variable while multivariate analysis examines two or more variables. Most multivariate analysis involves a dependent variable and multiple independent variables
How do we determine success?
Base stats seem like a good starting point.
But as you can see one Pokémon can throw off our data… cough… cough … Greninja cough … cough
As much as I feel Pokémon GO has flaws which shouldn’t get a pass, their CP attribute holds the answer to standardizing and scaling Ash’s teams.
What is CP in Pokémon Go?
CP (combat power) is not related to how much damage a Pokémon deals when attacking gyms, but is a combination of attack, defense and stamina (HP)
Using this approach helps level the field for those teams where Ash was heavy in one attribute, or when he only had one strong Pokemon.
His best rotation was in Sinnoh
His worst rotation was in Johto
I personally feel one of the best ways to reinforce learning is through a game. During all of my panels I like to play a game that reinforces a machine learning technique, in this case the propensity model.
Those who participated received a rare Pokemon TCG EX/GX individual card, a unified minds unopened TCG booster pack, and a gift certificate to Burger King ( a meal on me ).
Food is usually hard to come by at a convention, so I went back to my younger roots, and thought well I would have loved to get a free meal at a convention.
5 Volunteers
On the screen will be 3 of Ash’s Pokémon
2 Pokémon are look-a-likes (statistically speaking)
Volunteers will do their best to convince the me of which two Pokémon are look-a-likes and who should be wonder traded
For participating volunteers receive a fabulous prize
These are data science panels and we started off this panel with a video game recommendation engine. I had Stephen fill out a survey prior to the panel and from his results I built a recommendation model, with the goal of selecting games he has not played (he’s played a lot of games, so not an easy task) and would rate above average.
How are we going to build this recommendation? Through Propensity scoring!
A propensity score is an estimated probability that a data point might have the predicted outcome.
For this panelist, the survey told us this about their gaming preferences:
The value User Score more than the Critics Score.
Their preferred genre is Action Adventure.
Their preferred platform is the PS2.
On the screen will be a video game, with some profiling data.
Panelist will debate the impact, perceived and replay value of the featured game.
Crowd will decide who made the better argument.
This is the meat of the panel., on the screen is also the IGN review headline and rating, Stephen and myself would take turns and argue if it deserved it’s ranking.
Stephen went first and argued that Goldeneye does not deserve this high of rating and his key point was on the replay value. I attempted to argue on to value it at time of release. The crowd sided with Stephen.
I went first this round and argued for the rating, this was a very pro Pokémon crowd. Stephen brought up good points on where he thinks the series should go and adding another region is not the answer. The crowd sided with Me.
Stephen chose to argue for this game, I wanted to throw a curve-ball in this debate. It would have been very obvious if we chose Marvel vs Capcom 2, too easy. I argued that it wasn’t even the best in the series, and the best in the series is actually X-men vs Street fighter.
Stephen was on team Halo for this one, I love Halo as well, but the crowd did not. That was a shock to us but maybe Halo doesn’t have replay value? Or everyone is getting tired with the series.
Two games go in… only one comes out
Panelists will argue for a game, they cannot both argue for the same game
The crowd decides who had the best argument
This was fun and challenging section of our panel. I won’t go into details on this section but I do want to try something out. As test to see who is interacting with my page by reading the data stories, I have a special giveaway.
Here are the rules, you must have an Instagram account. You must be following my Instagram account: @pancake_analytics.
To enter you need read through the battle dome section, screen shot your favorite match-up and post it to instagram.
In this post I want you tag @pancake_analytics and caption the post with “Who do you have in this Battle Dome match-up?”.
This giveaway will end on December 31st, 2019 and the winner will receive a Game-stop Gift card from me. For to use on your next video game purchase in the new year!
Here’s the disclaimer I have to post:
Per Instagram rules, we must mention this is in no way sponsored, administered, or associated with Instagram, Inc. By entering, entrants confirm they are 13+ years of age, release Instagram of responsibility, and agree to Instagram’s term of use. Good luck!!!!!
Here’s the battle dome match-ups:
I want to personally thank everyone who attended the panel in Tampa, at the Tampa Comic Convention. I look forward to meeting again in 2020.