@Plat: I don't think that's trivial to sort out. TL;DR: It's not the rating system that sucks, it's the data that the rating system has to work with. The simplest quickfix would be to have ladder ratings start at something other than 0. My suggestion is to let new ladder teams pick between 800, 1100, 1500, and 1600 (default 800) as their starting ratings so they can more quickly face opponents around their own level, giving the rating algorithm data it can actually work with.
Counterproposal to Sephiroth: better APIs and community event support, so the community can run community events in place of the existing ladders/Clan Wars that drive the complaints in this thread
That team being #1 passes the eye test- they're 15-0, including a win against the current #1 team- but the rating system here is constrained by the quality and quantity of data, which is constrained by matchmaking, the size and pace of the ladder, and the starting rating being 0.
A fix to this would be having the starting rating on the 3v3 ladder being something other than 0- possibly 1500, because the 3v3 ladder generally doesn't attract newbies and it's fairly safe to assume that the average new 3v3 ladder team is at about the average skill level of the ladder. The Bayeselo ladders (1v1, 2v2, 3v3, Seasonal) have new players start at 0- i.e., the ladder assumes every new player is by far the worst player it has ever seen (except for the Seasonal, where most players start at 0 together so it sort of works out). Looking at actual ladder performance of newbies, at least on 1v1, a more accurate starting rating would be 900-1300 (FCC Discord chat about this: https://discord.com/channels/391085979756134411/430908503117529088/901256614638669865
The reason this matters is because the initial rating affects matchmaking, which seems to be what happened here. Here's the sanity checker spreadsheets for the top 3 teams:
You can see all 3 are probably underrated (the first two slightly, the third significantly). But let's look at it from another perspective. Who should have the higher rating?:
a) Team #1, who has gone 11-1 by curbstomping a bunch of 1700-1950-rated teams and has 1 loss to Team #2, or
b) Team #3, who has gone 15-0 by curbstomping a bunch of 1350-1750-rated teams and also has a win against Team #1?
Bayeselo has weird results because it has poor data. Both of these teams only have one matchup against a team of their own caliber. Matchmaking is hard
on a small ladder with slow-paced games- it's a compounding problem, because you have a chicken-and-egg problem between matchmaking and rating: bad matchmaking leads to inaccurate ratings, and inaccurate ratings make it harder to matchmake. All 3 of these teams have horrendous
What's holding back Team #3 is simply that their average opponent is really bad and plowing through a bunch of low-rated players doesn't mean much because you're expected
to win ~95% of the time against someone rated that much lower than you- there's low information gain. Bayeselo has to draw its conclusions from a dozen really weak datapoints and one quality datapoint, so it's a question of how much weight it puts on the one quality datapoint vs. the weak ones (which would have team #1 be rated significantly higher than team #3 since it's plowed through slightly better opponents).
make Bayeselo more aggressive (by increasing its equivalent of a learning rate, if it has one), but that would also cause it to get noisier for early teams (ratings would swing by huge amounts) and incentivize stalling, ladder runs, etc. Although Bayeselo exacerbates the problem here (see https://www.remi-coulom.fr/Bayesian-Elo/
- "Bayeselo behaves correctly when opponents' ratings are far apart"; Bayeselo is less aggressive than traditional Elo in rewarding players for beating opponents much worse than them), the problem isn't at the rating level and can't be fixed well at the rating level.
One solution I like is what chess.com does: When you start, you get to choose your initial rating (like a difficulty level). Ladders could do this and let players choose between 800, 1200, 1500, and 1600. This would influence their early matchmaking and let players have good match quality early on. From what I've seen, the current starting rating of 0 might also be driving players to leave the 1v1 ladder instead of grinding through a bunch of newbies:
I'm not even good and it takes me at least 12-15 games on a 1v1 ladder run to play someone around my own level. Like AI demonstrated (https://www.warzone.com/Profile?p=19129760430
), this can lead to cases where players get ranked on the 1v1 ladder with wildly
inaccurate ratings because they've only had a handful of matches against players around their own caliber. Rating systems can only do so much until they have data they can work with- the outcomes of roughly evenly-matched games.
From old forum posts, it seems the ladder used to start players at 1500 but that got changed to 0 by Fizzer to avoid confusing players about having a rating before they played any games. This introduced the new issue of grinding through newbies when you join a ladder. So we can address both problems by letting teams choose their starting rating (or just changing the starting ratings on the ladder- I propose 1100 on 1v1, 1500 on 2v2 and 3v3, and 2000 on Seasonal) and on the UI listing "Not yet rated" rather than the starting rating for teams with no finished games. Since it makes no sense to change the ladder behavior just to fix a minor communication problem.
(ETA: I'm not agreeing with the premise of this post, I don't think it would work to make the source code available someplace like GitHub. You would still have to go through Fizzer for code review and will still run into the same need-to-convince-one-guy-and-his-simps-about-every-obvious-little-thing problem as present. Remember how hard it was to have him not
break INSS? How much effort has been poured into convincing him Bayeselo is a bad fit for the 1v1 Ladder? How it's taken multiple seasons for the community to get even some obvious minor fixes to CW? That will be every single Pull Request conversation- a half-conversation with a guy who half-reads and half-responds to everything without bothering to understand. Sometimes it'll even get incoherent and totally nonresponsive, like several of the developer responses to Google app reviews- https://play.google.com/store/apps/details?id=com.warlight&hl=en_US&gl=US&reviewId=gp%3AAOqpTOFjkrmakxrJOI6_yfc8WjEY8hCL9JMfcputFAvP6IovbwRMNGVZJap4BmX-GS4jft3m97maTbOPX8JotQ
. I don't think Sephiroth's underlying frustrations would actually
be satisfied until/unless Fizzer's control of the game is removed from the equation, but then you'd run into all sorts of new problems around lacking an owner. Warzone also probably isn't going to succeed as an open-source project- open-source games don't do that great to begin with, and given the universe of other projects one could contribute to and the general value of the skillset required to competently contribute to something like this, I don't think there'll be enough time-investment to make it work. Realistically, Fizzer would go through a bunch of effort to make the source code available/understandable and ramp up other contributors but not get much out of it. Plus it would make stuff like cheating a lot easier. IMO, the original proposal of this thread would be a nightmare of wasted effort. My counterproposal is that the restrictions on the APIs- the membership requirement, the limitations on Create Game- get lifted, so the community can run community events, replacing CW and the Ladders where most of the complaints stem from.)
Edited 11/27/2021 21:55:26