<< Back to Ladder Forum   Search

Posts 1 - 19 of 19   
Changing one number can improve the Ladder: 8/22/2022 23:49:07


l4v.r0v 
Level 59
Report
TL;DR: We can improve the ladder experience for virtually everyone- competitive players, newbies, casuals alike- by changing the starting rating on Bayeselo ladders like the 1 v 1 Ladder from 0 to 1500.

---

Starting from the bottom is why we're here
The purpose of a system is what it does. The 1 v 1 Ladder does 2 things:
a) Ranking: it rates and ranks participants based on demonstrated skill
b) Matchmaking: it tries to pair participants with players close to their skill level

This is all pretty simple to accomplish with ratings based on game data. But what about when you're just starting out? You have played no games- whom should you get matchmade with?

On many other games using Elo-like systems- online chess (Lichess), online tetris (tetr.io)- the default assumption for new players is that they're somewhere in the middle. When you start out, your pseudo-rating is around the average rating, so you get paired with roughly the average player on the ladder. If you beat them, then you next face above-average players; if you lose, you explore the lower half, like a binary search- an efficient way to figure out your rating.

On Warlight, however, a decision was made a decade ago to avoid confusing players. To keep players from wondering why they have a rating when they haven't played any games, new ladder teams have a starting rating of 0 (cosmetically meaning "no rating"). This isn't just cosmetic, though! When you start, you actually get matchmade with the very bottom of the ladder.

This sets off a vicious cycle. Most new players are not, in fact, 1500 Elo points worse than the average ladder player. But starting from 0, they get matchmade with the lowest rated players (usually rated something like 600-900). Then, the first data point the ladder has to work with? "Beat a 600 rated player." Bayeselo is roughly a naive Bayesian maximum likelihood approximator for Elo ratings- but you can just think of it as like a person looking at your results and finding a reasonable rating. When you beat a 600, it just gives you a rating a little bit better than 600. Then you get matchmade with someone rated, say, 800. Now it's "beat a 600 and an 800." Another easy win, for most. Then you face a player rated 1000. Your initial rating has a lingering effect- deflating your rating for many games.

This breaks the ladder- for everyone
The complaints people have about Bayeselo- enabling ladder runs, making ratings game-able- actually come from the 0-rating, not Bayeselo itself. But it's not just competitive players that get hurt. The 0-rating leads to low match quality for all players early in their ladder careers and hurts newbies long term.

This is what the first 5 games look like (based on manual sampling in Oct. 2021):


As you can see, new ladder players are highly likely to leave if they just keep getting matched with substantially stronger or weaker players- it's a grind. People want enjoyable games against opponents roughly their own caliber; starting from 0 and working your way up is an inefficient search that gets in the way of that. Even below-average skill players have to go through a bunch of games before getting good, enjoyable matches.

But even for players who are low-skill, this is bad! If you're rated 900, you're on the frontlines to face players rated 0. Those 0-rated players are just players with no games, who could be newbies, average-skill players, or elite players on ladder runs. You can look at the ladder history of low-rated players (https://www.warzone.com/LadderGames?ID=0&LadderTeamID=25722) and notice that they face more high-skill players than you'd expect.

And for newbies? 0-rated players' closet-rated opponents are other 0-rated players, so they also roll the die. Among the very low rated players, you also have players on long boot streaks, like Scrooge McDuck (https://www.warzone.com/LadderTeam?LadderTeamID=13793). If you look at Scrooge McDuck's two ongoing games, they're against a legitimately low-rated player and a newbie, both of whom are just going to either waste their time waiting for an inactive to boot or get crushed if McDuck comes back online.

Starting from 0 hurts competitive and casual players alike by making the ladder exploitable, giving lower-skill players the worst matchmaking, and making the recent-join experience a grind on the ladder even for below-average-skill players.

Area man discovers ONE WEIRD TRICK. Ladder cheaters HATE him!!!

What if we started from 1500 instead? 1500 is probably optimistic right now- the average new-join is probably below-average skill compared to existing ladder players- but 1) that might not be true, since elite players go on ladder runs all the time; and 2) it's a chicken-and-egg problem because the ladder disproportionately attracts high-skill, competitive players, and isn't as attractive to casuals just looking for fun multi-day games rather than a serious try-hard experience.

Even with that source of error, 1500 will work better for:

a) Competitive players: starting from 1500 will get in the way of ladder runs, stalling, etc. Stalling is most powerful when the ladder has limited data to rate you on- e.g., when you've played 15 weak players and 5 players closer to your caliber, the outcome of any one of those games against players close to your caliber is more significant. If you set the starting rating to 1500, there'll still be a slight grind for elite players- they'll play an average skill player first- but instead of taking 10-15 games to get quality matches, they'll get them after perhaps just 5. That's more data, less power to ladder runs and stalling, and a healthier environment that doesn't disadvantage as much players who've been on the ladder longer.

b) Newbies just starting out: starting from the middle is a more efficient search strategy to find someone's true rating. Most newbies are still better than the worst-ranked players on the ladder. Starting at 1500 will also protect them from the randomness of facing other 0-rated players (who could have any skill level), the disappointment of facing players on boot streaks, the missed learning opportunities of only facing very weak players, and the baggage of having a bunch of old games against low-rated players dragging them down even if they improve later.

c) Casual and non-high-skill players: starting from 1500 will again reduce the grind you have to go through before getting quality games and make their opponent pool less random (if you lose to a 0 just starting out or a 600-rated player who snaps their boot streak, that drags you down hard and your rating gets pegged to theirs).

As far as cosmetics go, starting new players at 1500 or some other middle rating is a common pattern for Elo-like ladders. Players on Lichess, Tetrio, etc., are all familiar with it. I don't think the confusion can't just be explained away. Alternatively, the actual starting rating can be decoupled and the ladder could just say "Not yet rated" but use 1500 under the hood for matchmaking. (It doesn't have to be 1500- if you think that's too ambitious, even 1300 or 1100 would be significant improvements over 0).
Changing one number can improve the Ladder: 8/23/2022 00:44:18


(deleted) 
Level 60
Report
(yup, I read the whole post)

Reminds me of Baseball. All batsmen start at 1.000. "You're batting a thousand", as the saying goes.

Would that ^ be applicable?
Changing one number can improve the Ladder: 8/23/2022 00:52:40


l4v.r0v 
Level 59
Report
(thanks for reading the whole post!)

Yes, sort of. Except here the starting guess rating is sticky- imagine if you had a batsman who'd really bat 0.200 but they started at 1.000. They get a strike and instead of their batting percentage going to 0.000 (0 hits/1 ball), you update your estimate to 0.900. Then another strike and it's 0.850... and so on, so that it takes many at-bats to get anywhere close to their actual batting percentage.

The ladder ratings have a vicious-cycle problem- since your rating estimate informs matchmaking- that makes them sticky in a way the 1.000 batting percentage is not. This stickiness is what enables ladder manipulation, makes the early ladder experience unpleasant, and causes lopsided matchmaking for low-rated players. The stickiness is what makes it a problem.
Changing one number can improve the Ladder: 8/23/2022 01:21:41


alexclusive 
Level 65
Report
Yes please!
Changing one number can improve the Ladder: 8/23/2022 02:12:06


Master Turtle 
Level 62
Report
User voice it
Changing one number can improve the Ladder: 8/23/2022 02:15:59


l4v.r0v 
Level 59
Report
User voice it
I'd like to discuss with the community first, suss out disagreements, and build enthusiastic consensus.

Uservoice without the community overwhelmingly backing an idea is just sending a racehorse to the glue factory before he can hit his prime.

If you want to help, I'd appreciate everyone's answers to: What would it take for this proposal to earn 3 of your Uservoice votes? If/when changing the 1 v 1 Ladder starting rating to 1500 has 100+ committed votes, I'll Uservoice it. Barring that, perhaps Fizzer might find the logic of this thread persuasive & if we're lucky the change actually is easy to implement & improving the ladder onboarding experience will lead to a growth of the 1 v 1 Ladder playerbase. What I absolutely do not want to do is to submit a Uservoice, dust off my hands, and say my part is done, while the idea rots in the "Hot" queue for 3 years alongside suggestions for CoD: Warzone improvements and discount Ray-Ban ads.

Edited 8/23/2022 07:17:36
Changing one number can improve the Ladder: 8/23/2022 11:01:13


καλλιστηι 
Level 62
Report
Maybe let people choose their ELO from several choices?
Changing one number can improve the Ladder: 8/23/2022 15:35:06


(deleted) 
Level 60
Report
As an experiment, I joined 1v1. Two matches at a time. First one, a level 65 player and the second match is a level 56 player. 😩
Changing one number can improve the Ladder: 8/23/2022 15:49:27


καλλιστηι 
Level 62
Report
Don't you need level ~55 to join a ladder?
Changing one number can improve the Ladder: 8/23/2022 15:50:21


JK_3 
Level 63
Report
You can unlock it by buying the Strat package as well
Changing one number can improve the Ladder: 8/23/2022 15:54:19


l4v.r0v 
Level 59
Report
HangFire has membership.

@Kallisti: that's the chess.com solution and it would even work elegantly for WZ because the first rating won't cause rating drift (lower/raise the average) since Bayeselo will fit the rating into its distribution as soon as one game is played. That'd be even better, if you could choose from starting ratings of 1200, 1500, 1800. There's a potential for abuse (ladder runs/stallers) and somewhat higher implementation cost.

@HangFire: Levels are noisy indicators of skill. You can use Elo ratings instead- they will tell you players' expected win probabilities. You're not rated yet but you can compare the expected win probabilities against an average player (rated 1500).

The ladder has a level requirement to join, so you will not find many players below Level 50 on there.
Changing one number can improve the Ladder: 8/23/2022 18:30:29


FiveSmith 
Level 60
Report
Was suprised to see the data, that having 5 wins is just as detrimental to future engagement as 5 losses. If there is no sampling error, then the suggestion is 100% way to go.

You can count on my uservoices. (BTW, it would be fun to read a data-driven analysis of that uservoice effectiveness)

What is the reasoning for choosing the 1500 figure as starting for WZ? Currently in SEAD ratings are 1k max and in RT only 40 people have 1.5k+ ratings.

Edited 8/23/2022 18:46:55
Changing one number can improve the Ladder: 8/23/2022 19:10:56


l4v.r0v 
Level 59
Report
There's a lot to respond to, so forgive me for my verbosity & asides. I'll start from the bottom, since that's the most interesting. I'll also embolden my direct responses to your questions, but I can't resist the urge to elaborate since skill rating systems are so beautiful.

What is the reasoning for choosing the 1500 figure as starting for WZ? Currently in SEAD ratings are 1k max and in RT only 40 people have 1.5k+ ratings.
1500 is roughly the average 1v1 ladder rating, essentially by definition.

Elo (and Bayeselo, which is more or less a smarter way to compute Elo-like ratings) interpret skill as a competitor-level parameter that, given a skill difference, predicts win probability between two competitors. For example, a 2000-rated player should beat a 1500-rated player 95% of the time. A 1500-rated player has the same odds against a 1000-rated player, because the win probability is just based on the 500-point rating difference. Since only the difference matters, the mean rating is an arbitrary hyperparameter. 1500 is just a traditional value.

Starting at 1500 means the null hypothesis for new players is that their skill level is roughly similar to the average participant on the ladder. (This is probably optimistic at present, but not by much, and it'll self-correct if the ladder expands due to an improved onboarding experience.) For comparison, right now, with players starting at 0, the ladder's null hypothesis for new players is (unintentionally) "this is by far the worst player this ladder has ever seen." Since the null hypothesis influences early matchmaking, which influences the updated rating, which influences the matchmaking, and so on, the null hypothesis is sticky- especially when it's extreme (since "beat a very weak player" [you could be at any skill level except the very bottom] provides less information gain than "beat an average player" [you're probably above average] or "lost to an average player," [you're probably below average] making the updates conservative and therefore the ladder much slower to 'learn' a player's true rating).

As an aside/tangent, you can interpret the mean rating as the number of points a new player adds to the system total- which you can use for interesting analyses, like figuring out ladder rating inflation. Suppose a ladder has only one player- they'll just be rated 1500. Now suppose someone else joins whose "true" rating (on the existing ladder) is 1300- i.e., they're 200 points worse than the other player. 1500 & 1300 average to 1400, not 1500, so, in the steady state, the 1300 player also inflates everyone's ratings by (-1 * rating difference from mean before they joined) / (# of players, including themselves) and instead of {1500, 1300} you get {1600, 1400} to preserve the mean, meaning the 1300-turned-1400 player inflated everyone's ratings by 100 points. Of course, in the real world, we never reach this steady state, and I'm also not sure about how faithfully Bayeselo preserves the mean, so there's some hurdles to practical application of measuring ladder rating inflation. But this, for example, means that the same ladder ratings are more impressive/indicate higher skill with alexclusive participating in the ladder (i.e., having any unexpired games) than with alexclusive not participating, because alexclusive drags everyone else's ratings down whenever he joins.

Currently in SEAD ratings are 1k max and in RT only 40 people have 1.5k+ ratings.
SEAD and Quickmatch use a different rating system from the Bayeselo-based ladders, called TrueSkill, which was designed by Microsoft for multiplayer Xbox Live lobbies. It's a clever machine learning system that builds on the lessons from Elo & Glicko and supports fairly complex cases (like figuring out individual ratings from FFAs and team games or even a mix of those with individual games); you can find more about it at https://www.moserware.com/assets/computing-your-skill/The%20Math%20Behind%20TrueSkill.pdf.

Warzone also doesn't use TrueSkill faithfully or (imo) correctly because TrueSkill on QM has persistent rating inflation and significant rounding errors, exacerbated by QM matchmaking (RT matchmaking is very greedy, while MD matchmaking pools can get shallow, leading to lopsided matches). When your Quickmatch match rating is below 500, you only lose half the points you're "supposed" to w/ TrueSkill. This leads to an asymmetric rating exchange- you might lose 5 points while your opponent gains 10, increasing the system total by 5 points. (Note that the TrueSkill 0 starting rating is misleading; otherwise QM ratings would be deflating rather than inflating- I think what's happening is that the displayed rating is mu-3*sigma, where each player's rating is actually a distribution parameterized by mean mu and standard deviation sigma, so the mu is some nonzero value). Anyhow, the QM average rating drifts upwards, which leads to rating inflation.

QM ratings are also weird for two other reasons:
1) Rounding. You can't lose more than 10 points or gain less than 1. This means that lopsided matches have incorrect stakes. +1/-10 stakes imply a win probability of ~91%. The real win probability for most lopsided matchups (e.g., a top player facing a newbie on their first game) is usually higher than 91%, so top players' ratings drift steadily upward because the rating update stakes are more favorable to them than they should be.
2) QM is slow to learn. QM can't update your rating by more than 10, so it hasn't actually correctly learned the ratings of top players. Since it hasn't learned those players' ratings, it also struggles to learn the ratings of their opponents (mediocre players, everyone else), meaning that there's a large effort/participation component to QM ratings. This is why the QM leaderboards are such noisy indicators of skill- some players get to ratings like 1000 fast because their true rating is very high (and then stall due to lack of participation or quality competition), other players get to 1000 slow because they've just been playing a lot and QM has been learning them, and QM ratings don't let you tell those players apart.

that having 5 wins
It's because not all wins are of equal enjoyment/quality. It feels great to win- when it's against an opponent you could've lost to. But the first 5 ladder games for many players are generally lopsided & predictable. It's like if FC Bayern Munchen had to start each season by playing the bottom-division teams in German football and work upwards- those become more rituals or chores than games.

I noticed this because I first experienced it personally: https://www.warzone.com/LadderGames?ID=0&LadderTeamID=4100 -> when I return to the ladder periodically, I have to start with some unsatisfying games- like ones where opponents don't even make 6 picks (https://www.warzone.com/MultiPlayer?GameID=23128033) or just get booted (https://www.warzone.com/MultiPlayer?GameID=23207875). Having to work through 10 of those games before getting an opponent close to my own level is demotivating and usually I leave before finishing 20 games for this reason.

The surprising thing to me is that this isn't just a skilled-player problem but also applies to average players, because 0 is such a massive underestimate. A 0-rated player is expected to lose to a player on a boot streak ~97% of the time, which isn't even possible. It's just a bad null hypothesis to start with, even though functionally 0 is more like 600 since the ladder rating distribution doesn't go down to 0. But 600 is still a massive underestimate even for the vast majority of first-time newbies.

If there is no sampling error,
I sampled by looking (in Oct. 2021) at the most recently created 1v1 ladder teams. It's possible there is sampling error, but it's unlikely + doesn't affect the overall conclusion, which is derived from first principles. If you'd like to run a bigger analysis, that'd be awesome :) although you might want to sample differently (since my sample was just first-time ladder joiners, but there's also old players rejoining the ladder and leaving before they complete their 20 games). The empirical analysis is useful for identifying the effect- and for building a case that the ladder participation will substantially increase if the 0 were changed to a 1500 or some other reasonable rating. I think the mechanical analysis is the crux of the argument, though- starting from the bottom makes ladder onboarding grueling.

Edited 8/23/2022 19:26:11
Changing one number can improve the Ladder: 8/27/2022 07:59:40


Beep Beep I'm A Jeep 
Level 64
Report
I agree with this suggestion!
Changing one number can improve the Ladder: 8/31/2022 22:12:34


β”‚ [20] β”‚MASTERβ”‚ Rikku β”‚ I love my wife β”‚ • apex β”‚
Level 61
Report
FIZZER HATES YOU REPENT BEFORE IT IS TOO LATE
Changing one number can improve the Ladder: 9/26/2022 22:24:39


JK_3 
Level 63
Report
is anyone going to make a joke about putting this on uservoice?
Changing one number can improve the Ladder: 9/26/2022 22:44:18


l4v.r0v 
Level 59
Report
Turtle already did. Since this thread has been largely ignored, I suspect it would be an uphill battle to continue advocating for it. Given that, I suggest abandoning the suggestion since the MTL already provides a 1v1 ladder with respectable design and strategic merit, although it's somewhat smaller & quite a bit less accessible for the average player.

The requested change itself would take minutes, but convincing One Guy could take months of concerted effort.

Edited 9/26/2022 22:44:50
Changing one number can improve the Ladder: 9/28/2022 03:58:22


alexclusive 
Level 65
Report
An MTL butten where the laddet pages are could help finding it for new players, Fizzer seems open for implementing MTL things considering that he added the MTL trophy
Changing one number can improve the Ladder: 9/28/2022 06:58:03


l4v.r0v 
Level 59
Report
Uservoice it.
Posts 1 - 19 of 19