Predicting fantasy points is hard. On average the models are only off by about 6.5 points, but can miss pretty big when players do very well or poorly, which is really what you usually care about. This is inheritely a challenging problem - how do you use past data to try and determine a future performance that differs from that past (like a break out game or a slump)? On some level, I imagine this is impossible, but think we could maybe do better. For instance, my model doesn’t account for injury at all. Knowing that a player is coming off an injury could be very valuable. Or knowing that a key offensive lineman is out could really affect a RBs performance. I think the NFL has a lot of cool work that could be done with the right data and some deep thought about how these data could be used to understand performance.
Second, I was very happy to gain some more insight into what variables seem to matter when trying to predict a player’s fantasy points. In the future, I will be paying a lot more attention to a player’s opponents.
Lastly, I think it is really cool that with only a handful of variables a model could be developed that performs very similar to ESPN’s prediction model. This is what I love about data science. I was able to use data to gain a much deeper undertanding of an area in which I am far from an expert. I have no idea how ESPN creates their predictions, but I imagine they put some effort into it, and I think it is awesome that with just my laptop and some data I could create a competing model. If anyone from ESPN is reading this…I’d love to connect and learn some more about your model :).
P.S. - I will try and post some updates on the performance of the models as the season progresses if anyone is interested.