Data in Baseball
Will a baseball player improve or worsen over time? How can you tell when it’s time to cut them off? Data, again, can help us answer these questions. But first, an interesting fact: Baseball is the first sport to collect data of it’s players. This is one of the reasons why many statisticians and data scientists are so fond of baseball.
Doppelgänger Searching
Now, to answer the questions posed above, we can predict a player’s success by searching for their doppelgängers. That’s right, we can check for other players that are very similar to our chosen player, and see how they have performed; the chosen player would likely perform as the others did. For example, if we want to predict the performance of a 25 year old baseball player A and we know all his statistics (e.g. his height, weight, and home runs, etc.), then we would try to find a player with a past very similar to him. We might find a 45 year old retired baseball player B who had the same stats as player A when he was 25. Then, we can see player B’s performance post age 25, which gives us a prediction of player A’s future performance.
Predicting Success
Data analyst Nate Silver developed a model to do what has been described above: PECOTA. This model analyses all the stats of each player and finds the doppelgängers of baseball players which can help us predict the trajectory of baseball players.
Further Applications
The application of ‘Doppelgänger searching’ is not only used in baseball, though. Companies like Amazon and netflix also use it to recommend you products or movies you might like based on your doppelgängers. In this scenario, your doppelgängers would be the people who have very similar shopping habits as you, or those that enjoy watching the movies that you like.
‘Doppelgänger searching’ is like ‘Collaborative filtering’ - something I have written about previously. If you would like to check it out, here’s the link.