Hi everyone, it seems my last post (some visuals on stalling in the 1v1 ladder) generated a lot of community interest. I mentioned I have a few blog posts upcoming, and I figured I would offer a few more interesting tidbits from those posts as a bit of a teaser.
Without further ado, I present you with visuals on how well Warzone's Elo rating system actually predicts match outcomes. It's actually far better than I expected.
A much deeper analysis will be presented in the blog posts, but for this post I will simply open with a few facts.
1) One of the key functions that ELO bases itself around is that ratings are calculated to represent the relative win probabilities of two players when they face off.
2) These visuals Take a simple approach to measuring the rating system's accuracy. It groups predicted win probabilties for each game into small bins (in intervals of 1%). Think of this as it measures the prediction accuracy of Warzone's ratings separately for games where the projected win probabilities of the superior player are in the interval (50,51%] (and hence playing someone with a projected win probability of [49-50%) then repeating the same for superior player interval (51,52%] vs [48-49%) and so on. Then it looks to see if the win rates of those superior players line up with what they should be (i.e. in games where the superior player is predicted to win 84% of the time, is their win rate 84%?). I first provide a plot of exactly this. Then, I provide a related plot showing the errors for each of those intervals (the difference between the actual and predicted win rates). Smaller binning will be examined further in the blog posts.
3) This analysis was conducted on all 1v1 ladder Warzone games where both players had ratings this amounts to 140,635 1v1 ladder games. If you are curious about separating these based on when Warzone switched how the expiration window on games, see my upcoming blog post :)
4) I am also providing a histogram of the counts of games won by players with each predicted win probability, it's not super informative beyond noticing that its mean is larger than .5 and that Warzone's 1v1 ladder does a pretty good job of assigning games that will be as even as possible, but it can be nice to look at.
I am going to leave the analysis out of this (save it for the blog) other than saying that those error values are way lower than I think many would expect, Warzone's ELO ratings actually do a really good job. But please do offer up your own takes.
Edited 2/16/2019 04:53:39