
Unlock the Secrets of NBA Collectibles: This Data-Driven Model Predicts the Next Big Stars in the Card Market!
Click bait title aside. Can data reveal which NBA players will become collectible icons in the future? In this case study, I’ll explore how data and logistic regression can answer this question. Using performance metrics, career achievements, and sales data (via Cardladder), Iโll build a machine learning model to predict whether an NBA player is likely to have long-term collectability.
This analysis not only identifies which players are already collectible but also offers a framework for predicting future collectible stars.
Why Does Player Collectability Matter?
In the trading card market, long-term collectability translates into value, desirability, and relevance. Players like Michael Jordan or Kobe Bryant have cemented themselves as legends not only in sports history but also in the collectibles space. Knowing which current players will follow their path is valuable for collectors and investors.
How the Data Was Prepared and Sourced
Player stats were sourced from Basketball Reference. The stats represent career averages and totals for each player. Metrics such as points per game, assists, field goal percentage, and player efficiency (PER) were used to model their collectability.
77
Active/Inactive Players
493
All-star appearances
358
All-NBA selections
Solving for Career Length Impact
To account for the impact of player career length on collectability, I used the following approach:
Estimated Win Shares (WS): Basketball Reference provides an estimate of a player’s contribution to team wins throughout their career.
Career Efficiency Metric: I computed wins per game as : Win Share Per Game equals Estimated Win Shares divided by Total Game Played.
This metric balances players with long careers and those with shorter, impactful careers. A player’s win share per game provides insight into how efficiently they contributed to their team over time, irrespective of how many seasons they played.
Creating the Collectability Metric
For this analysis, the dependent variable (collect) is a binary field. Here’s how it was derived. CardLadder, which tracks secondary market values of trading cards was used, specifically the player index values. I created an average index value per card for each player and then added a weighting formula to sales volume and the total number of cards for each player index. Players with a weighted score above the market average were assigned 1 (collectable), while those below the average were assigned 0 (not collectable).
Something I don’t normally do but I should do more often.
This may seem out of the norm for me and I want to hold myself accountable for what I consider to be a key pillar of Pancake Analytics and that is to educate. Below is going to be a break out the code used for this analysis as opposed to me only showing the output and the articulation of the analysis.










Conclusion
This logistic regression model provides valuable insights into which NBA players are likely to become collectible icons. PER and MVP awards have the strongest influence on collectibility, while lower shooting percentages reduce the likelihood.
With this model, collectors can make data-driven predictions about future stars, helping them stay ahead in the trading card market.



A Tool You Can Use Below














