Like with any sport, the question of who are the best competitors of all time in Mixed Martial Arts (MMA) is something that is hotly debated among MMA fans. And unlike for tournament based sports such as tennis, or sports where results can be objectively measured such as track and field, it is a question that is much more difficult to answer in MMA. Firstly, fighters compete in different weight classes and organizations, often making direct comparison impossible. Secondly, even when competitors are in the same weight class in the same organization, comparison can be difficult simply because of the relatively low number of matches that an average competitor has – not everyone is competing against everyone else. All of this leavies a lot of room for speculation and what-if scenarios. Indeed, any ranking of fighter’s skills is highly subjective, dependant on the criterias set forth and as a consequence highly debatable.
What we can do though is to take a purely statistical approach, by estimating which fighters are the likeliest to have the highest skill level, considering their wins and the quality of their opponents. This blog post intends to do just that, by using state of the art statistical methods that have been successfully used in similar settings, for example to analyze and rank chess players (see TrueSkill Through Time: Revisiting the History of Chess) and for ranking players in competitive online games.
Dataset
The data we will utilise is obtained from Sherdog fight finder, a comprehenseive database of MMA matches, dating back from 1980 (with the first match between Casemiro Martins and Rickson Gracie). The dataset includes results of over 200,000 matches and over 94000 fighters in total. The following chart displays the number of matches per year in the dataset.
In this data, we have full details of match outcomes, including win type (if the fight ended with a win for one of the contestants), but also information about matches ending in no contest or disqualification. Since the study tries to esimate skill, we ignore data about both NC and disqualified matches (even though the latter means a win for one of the fighters, it is not possible to infer the skill level based on that fact alone, Jon Jones vs Matt Hamill being a good example ). All computatios are done on the snapshot of the data as of Dec. 20, 2015.
Skill inference
Our goal is to assign a skill level for each fighter, based on observed match outcomes. As mentioned earlier, the methodology we will employe is based on the same Bayesian statistical approach that is used by Microsoft for Xbox called Trueskill to determine the skill level players so that they can be matched up optimally competitive multiplayer games such as Halo. Also, the same approach has been employed to rank chess players using historic chess match outcome data from 1850 to 2006 (see TrueSkill Through Time: Revisiting the History of Chess).
We rely on Bayesian inference to infer the skill of competitors based on the observed match outcomes. As skills of fighters are unknown, they are assigned a probability distribution, in the current case a Gaussian with mean (\mu) and variance of (\sigma^2). The mean (\mu) of a fighter skill specifies the average performance of the fighter. Our uncertainty in the skill level is specified by variance (\sigma^2). The match outcome is determined by the performance of both contestants. Since performance of a fighter can vary from match to match (good days and bad days), it can be thought of as a noisy version of skill. The winner of the match is the one with hgher performance in that particular match.
The following is the high level summary of the model we base the stastical inference on
-
Each fighter’s skill is a normally distributed (Gaussian) random variable (the mean and variance of which we will lern from the data). Since a fighter’s skill changes over the years, we have a skill ranking for a fighter for each year he is active. These are the values we want to infer.
-
In every match, fighters performance comes from skill, i.e. is drawn from Gaussian ( \mathcal{N}(skill, performanceVariance)), where (perfromanceVariance) is another hidden variable, learned from data.
-
A fight ends in a decision victory for fighter (f_1), if his performance (p_1) is greater than the performannce of fighter (f_2) + some threshold (decisionThreshold), i.e. (p_1 > p_2+ decisionThreshold).
-
If the skill difference is less than the decision threshold, the fight ends in a draw.
-
A fight ends in a finish victory (sbmission, KO or TKO) for fighter (f_1) if ( p1 > p2 + finishThreshold)
-
Both finish and decision threshold themseves are hidden variables that are learned from data
-
Finish threshold is constrained to be strictly larger than decision threshold, i.e. (finishThreshold > decisionThreshold)
-
Two contestants in a match are assumed to be on a somewhat similar level, i.e. their skill level is very unlikely to be vastly different. In other words, UFC level fighters are in general matched up with other UFC level fighters, not low level amateurs. This assumption helps us in inferring the skill level of fighters who have a very small number of fights. Thus for each match, we assume (\mid skill_1 – skill_2 \mid \lt matchupThreshold ). The value of matchup threshold is again a hidden variable learned from data.
To implement the model we use Infer.net, a Bayesian inference framework. The model is in large parts based on the Chess analysis model built by Microsoft Research on the same toolset. The code for the model can be found in Github. The final inference is done model is inferred based on the full Sherdog fight database.
Dealing with weight classes
While most matches are within a weight class, there are also matches where contestants in different weight classes are matched up. In this case, a heavier fighter has an advantage on average. This translates into a bias, where the average skill levels of fighters in higher weight classes will be observed to be higher than those of lower ones. To create a true pound for pound ranking, we can normalize the skill by weight class, by dividing each fighter’s skill level by the average skill level of the given weightclass.
The mean skill level over all fighters is 1000 (since this is chosen as the prior/baseline). The following are the weight class averages (NA denotes no information about a fighter’s weightclass).
Weight class | Average inferred skill |
---|---|
Heavyweight | 1159 |
Light Heavyweight | 1141 |
Middleweight | 1140 |
Welterweight | 1132 |
Lightweight | 1119 |
Featherweight | 1115 |
Bantamweight | 1085 |
Flyweight | 1085 |
NA | 893 |
A note about this table: this shouldnt be interepreted in absolute terms (“flyweights are only 6% weaker that heavyweights”). Rather, this is a bias correction table for the pound for pound ranking. In p4p terms, every weight class should have the same average skill level. Since across-weight class matches bias the numbers, this is the table we can use to correct it.
Jon Jones is the best MMA fighters ever
Long story short, statistically, Jon Jones has the highest skill ranking of any fighter in the database. The following graph shows the computed skill level of the top 10 male fighters out of 94000 fighters in the database.
Jones stands quite a bit higher above other fighters in terms of his skill rating. Where other top fighters’ ratings are very close to each other, Jon Jones’ rating is clearly above. Statistically, this isn’t a surprise. He is without loss (the model disregards DQ losses as irrelevant to skill), and with wins over extremely high level competition.
Looking at the rest of the list, there are some interesting results there. While Daniel Cormier isn’t typically ranked among the very top, statistically speaking, he should be. His only loss is to number one in Jones, and almost all of his wins are over very high quality opponents. The combined record of his opponents is the best in all of top 10.
Anderson’s Silva’s position as an all-time great is statistically somewhat hurt by the start and end of his career, his two losses to Chris Weidman and the losses to relatively weak fighters early in the career mean that from purely statistical point of view, he does not quite reach Jon Jones’s level. But even so, he is among the very top.
Ben Askren is definitely a surprise in the list. His undefeated record is the main reason he is this high. Since we have not observed a loss for him, it makes his probable skill very high from a statistical perspective. This is in some sense a drawback of the model, but fundementally, it is difficult to draw a conclusion about undefeated fighters. Khabib Nurmagomedov is another example for this. Normally, he isn’t considered p4p top 10. Statistically however, it makes a lot of sense, being 22-0 and having a very solid win list, including the current champion.
Two fighter have made meteoric rises in the rankings. Rafael dos Anjos and Conor Mcgregor, starting from a relatively low baseline(due to their lower level competionion and losses earlier in the career) have risen to top 10 very quickly, thanks to their very strong wins in the recent year(s).
One thing that should be emphasized is that the statistical skill levels for the top fighters are actually extremely close. For example, Conor Mcgregor came into the top 10 ranking only after beating Aldo, who was top 5 prior to that. This means a single fight can change the rankings very significantly.
Additionally, as mentioned earlier there is also variance associated with each inferred skill which reflects our statistical uncertainty in the estimation. An this uncertainty is (naturally) very high, due to the low number of fights each fighter has (unlike say chess, where the number of games can be in the thousands for a competitor). This is illustrated by the following chart, where we plot 95% confidence intervals around the mean for fighters often considered to be the top 3 of all time.As can be seen, the intervals are fairly wide and overlapping. What this means is that while Jones has the highest average skill given the match outcomes, we cannot conclude with absolute certainty that his skill level is above the other top ranked fighters, simply that it is likely to be the case. And indeed, a slight underperfomance in one of his fights (such a loss to Gustafsson) would have changed the rankings drastically, showing how fragile the ranking estimation actually is. This all boils down to the fact that since the number of fights of a typical competitor is fairly small and there is a lot of “luck” involved, small mistakes making large changes in the overall picture. And our model is well able to capture and quantify this uncertainty.
Where is Demetrious Johsnon?
Many consider the long-reigning flyweight champion Demetrious Johnson to be among the all time best, so him not being in top 10 statstically is somewhat surprising. Looking at the data further, the main reason for this seems to be overall competition, which surprisingly seems slightly weaker in the division. For example, his oppponents’ and their opponents’ combined win percentage is 62.9%. For comparison Chris Weidman’s opponents and their opponents win ratio is 64.6%, for McGregor it is 64.1% . This translates to Demetrious’s opponents ranking be on average lower than the opponents of the fighters in the top 10, in turn translating to his score being lower.
Summary
Bayesian inference provides an excellent toolset for inferring hidden parameters in the data, especially when the amount of data to draw conclusions on is small. As a result it is especially useful in a setting such as reasoning about the skill levels of top MMA competitors. Based on the available data, Jon Jones is the best ever from a statistical perspective. But it has to be kept in mind that the variance (uncertainty) in the estimates is very high and small changes in the data, such as one win or loss, can significantly alter the rankings. And this reflects the real world intuition well: a loss or two can easily change the perception of the fighter in the list of all time greats.
The code for the model to replicate the results or develop it further can be found at Github.