The debate
I have this ongoing debate with some friends, which also seems to be unresolved for people who have looked at it for much longer than we have. The question is: can model-based decision-making reliably outperform humans in a fantasy football draft?
On the one hand, there’s a cottage industry of people making forecasts and developing strategies for how to order draft picks. For player values, the usual suspects like ESPN, CBS, NFL, Yahoo Sports, and etc., produce predictions of expected player stats for the season; and there are also groups like fantasyfootballanalytics.net for example, who produce their own ensembled predictions, all with convenient R libraries and years tuning through of trial and error to get things working smoothly. For drafting the players, there’s a widely referenced value based drafting (VBD) strategy, first described in Fantasy Forecast Magazine in 2001, and then updated in this write-up by the footballguys in 2005. There’s also this surprisingly frequently referenced comparison of bidding policies from 2012, but I won’t get there in this post because the scope of our question is limited to snake drafts where pick order matters, but all players cost the same.
On the other hand, player value predictions aren’t perfect. For example, this fivethirtyeight article points out that ESPN fantasy projections for top running backs before the season overpredict performance, such that the top 12 players are in general ranked 2-3x lower in the actual season. There are also criticisms of VBD, as for example in this post on ESPN, saying the resulting teams “smell funny.” And then there are posts like this one on reddit, which managed to identify a combination of 9 reasonable players that underperformed by 80 points on some random week, underlining how noisy the player performance forecasts can be, albeit without saying how many combinations were considered to find this one bad team, or how many teams would have instead overperformed by a similar margin.
So to try to settle the debate, I tested if a vanilla VBD drafting policy could reliably produce rosters that come out on top in a league of typical fantasy football players. To do this, I pulled average player draft positions across thousands of fantasy leagues collected in myfantasyleague.com, and used these as input parameters to simulate leagues of 11 human drafters to compete with 1 VBD drafter. I measured performance by evaluating the actual fantasy points collected by the best set of starters on each team and then finding the rank of the VBD drafter.
TL;DR
In general, VBD gives a slight edge, but it could be an occasional slam-dunk when players drafted early by humans turn out to be lemons.
###
Results
I simulated drafts from three seasons (2014-2016), with the VBD drafter placed in any of the 12 positions. Across all leagues, VBD rosters had a mean rank of 4.68 when evaluating fantasy points observed at the end of the season. This is significantly higher than a rank of 6 that we would expect if VBD drafters were indistinguishable from typical human drafters (Chi. Sq. p.val=2.2e-16). Unfortunately, that is also substantially worse than how well we would have expected to VBD rosters to rank based on preseason projections, for which the mean rank is 2.05. So VBD does tip the scales slightly, but it isn’t exactly a silver bullet.
Interestingly there was a fair amount of variability between how VBD performed in the three years. Both expected ranks and observed ranks were noticeably higher in 2015 than 2014 and 2016.
I’m not sure how to explain this yearly trend. The projections were downloaded from fantasyfootballanalytics.net separately for the three years, so maybe there is an issue with the data. Looking at the quality of preseason forecasts though, the Pearson correlation and the mean absolute error (MAE) were not dramatically different in 2015 than in other years. So maybe forecast quality should be evaluated differently, or there might be a better explanation. |Position|Season|Pearson correlation|Mean Absolute Error| |QB|2014|0.7124|60.4863|
0.7124
QB | 2015 | 0.5562 | 68.4075 |
0.5562
QB | 2016 | 0.6117 | 62.9405 |
0.6117
RB | 2014 | 0.4459 | 56.5074 |
0.4459
RB | 2015 | 0.5165 | 52.4999 |
0.5165
RB | 2016 | 0.5724 | 53.1948 |
0.5724
TE | 2014 | 0.4884 | 43.3151 |
0.4884
TE | 2015 | 0.6593 | 32.0161 |
0.6593
TE | 2016 | 0.6001 | 35.3129 |
0.6001
WR | 2014 | 0.6390 | 45.4527 |
0.6390
WR | 2015 | 0.6362 | 48.7875 |
0.6362
WR | 2016 | 0.5236 | 48.1142 |
0.5236
There was also a relationship with VBD draft turn, performing better with turns in the middle of the rotation. What stood out in 2015 is the first two draft positions resulted in some of the worst outcomes and late picks didn’t hurt performance like they did in other years. But, that doesn’t exactly point to what differentiated that year, since early picks also performed poorly in 2014.
Another explanation might be that in 2015 there were some obvious top picks in the first two round that ended up bombing. To get a sense if this is the case, we can look at the correlation between expected – observed points and draft position for each year. A correlation of 0 would indicate that across draft positions there wasn’t a trend with underperformance. A negative correlation might indicate that players drafted early underperformed to a greater extent than players drafted later. |year|correlation| |2014|-0.236|
-0.236
2015 | -0.318 |
-0.318
2016 | -0.258 |
-0.258
Turns out that this was indeed the case. Players drafted early in real fantasy football leagues underperformed more in 2015 than other years. The biggest offender that year was Le’Veon Bell, who injured his knee in November. But in all, 8 of the first 24 players drafted (first 2 rounds) underperformed by more than 100 points, compared to 2 and 4 players in the other two years.
Ok, so I wouldn’t exactly say the case is closed here, but at this point, I’d like to submit this as a first argument in favor of model-based drafting being superior.
###
And here are the deets, in case anyone wants to know more
League overview. I simulated a standard snake draft in a league with 12 teams. One team drafts players with a VBD-like policy, the remaining 11 have a simulated baseline policy. In all, I simulated 100 drafts for each year and each possible VBD draft position.
Evaluation. After the drafts, I identified the best set of starters on each team using actually observed fantasy points for each player and then compared how the VBD rosters rank in each league. In other words, assuming the teams are fixed after the draft, and the one best roster is selected by some oracle, how well does a VBD roster perform relative to baseline rosters? Another artifact is that I’m not considering defense, special teams, and kickers because people don’t generally draft them meaningfully early, and their performance doesn’t correlate well year to year.
Simulated human drafter. On each baseline draft turn, for every undrafted player, I sample from a Poisson distribution with lambda set to that player’s mean draft position, and then take the lowest sampled value, resolving ties by choosing the player with the lower mean draft position. This is done without picking backup players until all starters are selected. |position|num starters in position|max players in position| |QB|1|3|
3
WR | 2 | 5 |
5
RB | 2 | 5 |
5
TE | 1 | 3 |
3
FLEX (WR/TE/RB) | 1 |
VBD policy.Here I’m just going to follow a vanilla version of what’s in the footballguys’ VBD description. On each turn, a player is drafted by taking top player in the position with the greatest discounted incremental value given by:
where and is the points value of the top player in the position after the first 100 picks.
Though there is VBD rule 7: “Know When to Deviate from VBD Principles”, and in practice doing this may give a team without a quarterback for example. So to be fair, we’ll also apply the same policy of not drafting backup players until there is a starter in every position.
The data and le code. I downloaded the yearly fantasy projections from the projections tool here, the player rankings from myfantasyleague.com, and actual player performance from the NFL using nflscrapR. After a bit of munging, I joined the three years from each source into the final dataset here. The draft simulation and results were obtained using this script.
Like this:
Like Loading…