<< Back to Ladder Forum | Discussion is locked - replying not allowed   Search

Posts 1 - 30 of 50   1  2  Next >>   
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/20/2011 19:54:50


crafty35a 
Level 3
Report
First of all, we need a shorthand way to refer to this feature, that's quite a mouthful!

Now, I know I was personally quick to argue that I don't think this method makes sense when it was originally announced in a blog post. But I thought it would be good to get some discussion going, and I just want to provide one example of the odd results you can get with this method.

Currently, NoZone is ranked 4th on the ladder with a 1630 rating. His only result is a loss to Fizzer. Because Fizzer has since achieved a high rating (number 1 on the ladder, currently), this retroactive ratings adjustment seems to be giving NoZone a lot of credit for that loss.

Does this make sense to anyone? I don't think you should ever gain rating points for a loss (which is the way it works in a typical ELO system).

(By the way, sorry to single you out, NoZone. You may indeed be a great 1v1 player for all I know, I'm just pointing out the absurdity of the rating method with your current results)
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/20/2011 20:03:12

Fizzer 
Level 58

Warzone Creator
Report
I agree that situation looks suspect. I'm working on a system now that will increase transparency in how the ratings are calculated.

This will help us understand what's going on more, which is the first step in deciding if we want to change to something else. I'm completely open to changing to a different rating system, but I think we should give it at least a week to settle down. I'm sure I won't be on the top for long :)
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/20/2011 20:31:21


Ruthless 
Level 36
Report
I think with any system that can accommodate a huge amount of people with a bunch of data to crunch is going to be very choppy when starting out. I think the system doesn't have enough data yet to "truly" show the correct standings. Like Randy said, lets give it a couple weeks with a lot more games under it's belt.
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/20/2011 21:47:58

Fizzer 
Level 58

Warzone Creator
Report
I agree with Ruthless in that results are very choppy now and it will settle down over time. But there are a couple surprising outcomes.

Let's pick on Blue Precision and NoZone.

BP is undefeated - he has 1 win and 0 losses, but his win is against the lowest player (who is 0 for 5.) As a result, Bayeselo gave him almost nothing for the win and gave a rating of 1334.

NoZone has no wins - he's 0 for 2. But his losses are against very good players (#1 and #2 rank.) As a result, Bayeselo didn't hold these losses against him and gave him a rating of 1584.

I'm surprised that an unvictorious player can be ranked above a undefeated player. I've looked into a bit, and as far as I can tell this is the expected result of Bayeselo's algorithm (I verified that first pick advantage is not causing this, and that all the games are being accounted for, and that the correct winner is being input for each game.)

I'm going to continue to investigate more.
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/20/2011 22:25:57


NoZone 
Level 6
Report
Hello,
No worries about bringing that up, I noted the same thing with some amusement. 0-2 resulting in an improved rating seemed odd. I think where it comes from is that since these were some of the first games played we were all equally ranked at 1500. So a win/loss doesn't do much directly but the subsequent games have an adverse effect once there are significant differential scores in the mix. I think this is only an issue from this initial phase where everyone is on paper equally ranked with 1500. As soon as a few more games pass, I think it will settle out to something more realistic. Especially if there is the expiration on the past game effects as mentioned elsewhere.
NoZone
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/20/2011 23:10:45

The Impaller 
Level 9
Report
I agree that this is pretty weird. I'd say give it a chance to balance out after a few weeks or so and see how it looks then, before calling for the pitchforks and torches.

I do think it's odd that you can gain points from a loss and lose points from a win. Any of the ELO systems I've seen are generally set up such that there are a fixed number of points that can be gained or lost in a match (say 40) and it's split based on the relative rankings of the players. So if someone is 200 points higher than another player and they win, they'll get 10 points and the other player will lose 10 points. But if they are even, one player gains 20 and the other loses 20. If it's a 400 point differential and the higher ranked player wins, they may gain only 1 point and the loser only lose 1 point. But if the lower ranked player were to win, they may gain 39 points and the higher ranked player lose 39. Something along those lines is what I've experienced in any of the sites/games I've played that use a similar system.

I did find it odd that BP was ranked so low. My first thought was "I wonder who he lost to" and then I clicked to find out that he hadn't lost to anyone.

I think that this Bayesian system isn't designed the way we normally expect, which is to award points for wins and take away points for losses, but rather designed purely to format rankings accurately based on true skill. So it may seem weird that it's doing this but it might also balance out to be a much better system in the long run once more data has been collected.
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/20/2011 23:40:45


crafty35a 
Level 3
Report
Right on, Impaller, that's how all ELO systems I've seen in the past work. It looks like the game vs. NoZone was actually Fizzer's last completed game. So the only way his rating changed in the meantime was due to retroactive adjustments. Once those adjustments are made to Fizzer's rating, does that then also change the rating of all of his past opponents? If so, where does the chain end? It could go on forever!
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/20/2011 23:49:05


NecessaryEagle 
Level 56
Report
I think what the problem is is the way the retroactive scheme is set up. it seems that while NoZone lost, he lost points, but then his opponents rank went up, and instead of just re-doing the point change from the NoZone game, it transfered extra points to Nozone which was a higher amount that what he initially lost. in other words, the system should be based on percentages instead of numbers.
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/20/2011 23:58:03

The Impaller 
Level 9
Report
I don't think the chain goes on forever. Rather it goes back to the beginning, which in this case will always be 3 months, since ratings are only calculated for the last 3 months of play. This means every time ratings are updated, they have to be recalculated for every player taking into consideration every ladder game in the last 3 months. Fizzer mentioned somewhere about it could take a hour or more to run that calculation and I'm not surprised, because that could potentially be a lot of data to run through.
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 01:43:08

Fizzer 
Level 58

Warzone Creator
Report
This post explains how to run your own ladder simulations:

http://blog.warlight.net/index.php/2011/02/running-your-own-ladder-simulations/

This is useful for understanding the ratings.
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 03:22:14


Perrin3088 
Level 44
Report
the problem in all likeliness will never be solved for middle grounds players.. any new players in the ladder would affect the current players, and be affected themselves similarly to how Nozone and BP are currently being affected. Say you're an average player at 1500 score with a history, then someone new joins that does sub par, and losses his first 7 games.. all of the sudden, with an average score, playing people only as good as you *1500 ish* your rank drops drastically due to it being unable to correctly place the newcomer...

the retroactive ratings would probably work better for people that have history.. IE, anyone that hasn't been in the ladder for at least a month/X games, the games with them are done without retroactive ratings enabled..
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 03:48:52


crafty35a 
Level 3
Report
Well I've been playing around with the rating tool for a while, and I think I've nailed down what the issue is. I believe this tool is tailored towards calculating ratings for a set of players that each have a constant, unchanging playing strength. Why do I think this? First of all, notice the names of the "players" listed in the provided examples (http://remi.coulom.free.fr/Bayesian-Elo): Comet B.68, Dragon 4.7.5, Gandalf 4.32h, etc. These are all fairly well known chess engines (essentially AI programs that play chess).

Logically, it would absolutely make sense to retroactively adjust ratings based on the future performance of opponents, if the "players" were actually specific versions of chess engines. Why? Because these engines have a constant, unchanging strength level. Say a chess engine plays one game today, and another 99 games over the next six months. If we want to calculate the strength of the chess engine at the time of the first game played, every single one of the 100 games should be considered with equal weighting, because *the strength of a chess engine does not change over time*!

But with a human player, that of course is not true. I think this is the fundamental flaw with using this method to rate human players.
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 04:51:07


NecessaryEagle 
Level 56
Report
and why is NoZone's rank higher than FBG-Dragon's?
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 04:52:37


NecessaryEagle 
Level 56
Report
the way it looks right now is that loosing to a good opponent is better than wining against a bad opponent, so if your first couple games dropped you on the ratings, then it's harder to rise because you don't get placed with higher players
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 05:33:51

The Impaller 
Level 9
Report
It does seem to be that way, however that may be corrected as it rediscovers who is a good player and so forth.
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 05:51:03


Perrin3088 
Level 44
Report
it seems to me like the early games modify your rating too much.. IE, you shouldn't be able to drop to 1300/raise to 1700 in just a couple of games.. so as to keep new players more average until their real potential is proven..
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 07:15:12


Perrin3088 
Level 44
Report
Fizzer, why didn't you just put
offset 1500
on the page you linked us so it would automatically show the warlight rating?
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 07:24:21

Fizzer 
Level 58

Warzone Creator
Report
I never noticed the offset command. I'll add that in - thanks!
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 07:36:36


Perrin3088 
Level 44
Report
I also think that when we get a more established ladder, to solve my earlier fear, *3 posts up i think* we could implement a removerare X command before the elo command... it would make it so that new players would have to get at least X games before they influence the ladder, which imho could help keep the average range of players more steady... but ofc' idk, it will always be partly unsteady as long as new people come in so..
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 07:43:52

Fizzer 
Level 58

Warzone Creator
Report
Perrin: I was just thinking the same thing. I was thinking it would be good even now - if the rankings that are displayed now are meaningless, they shouldn't be displayed at all. It's only causing mass panic and confusion.

I think it would be good to hide ranks until you've completed a certain number of games.
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 08:00:30


Perrin3088 
Level 44
Report
I'd also like to point out, that only 3 people haev complete 5+ games according to that list, lol...


and i am currently running a test ladder using the same names, and a random number generator to determine w/l's and seeing what comes up atm
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 08:06:14


Perrin3088 
Level 44
Report
hmm.. also check out perhaps instead of using removerare X from resultset> you could change ratings to ratings X.. it would allow the games to still have an affect on the players that actually have had enough games, but not for the actual new players to show up until they have reached the threshold..
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 08:12:26


Perrin3088 
Level 44
Report
hmm, and in extra testing, for some reason using removerare 5 on the current ladder is causign the program to hang up on me..? i thought they fixed mm hanging up when players unconnected back in '05?
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 08:37:33


Perrin3088 
Level 44
Report
hmm.. somehow i never got my test to pass the mm stage in the removerare 5, but one of the times i accidentally did ratings >removerare 5.txt it actually seemed to show up properly, despite not having been mm'd properly..

variances.

ratings 5
Rank Name Elo + - games score oppo. draws
1 Perrin3088 46 258 258 5 60% -17 0%
2 Knoebber -133 179 179 7 43% -94 0%
3 3A6L3BA5T -337 227 227 5 0% -108 0%

removerare 5
Rank Name Elo + - games score oppo. draws
1 Perrin3088 94 282 282 2 100% -47 0%
2 Knoebber 0 271 271 2 50% 0 0%
3 3A6L3BA5T -94 282 282 2 0% 47 0%


using the real data of course.. so not much data.. :/
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 08:39:10


Perrin3088 
Level 44
Report
and i ran my test ladder through 15000 games i think, and then ran a new player through 10 games against the middle of the ladder, and the variances seemed to die down a fair bit around 6-9 games or so...


and just for SnG's, here's the results, *based absolutely only on RNG's, lol*

Rank Name Elo + - games score oppo. draws
1 Ruthless 49 20 20 1103 55% 2 0%
2 CuChulainn 39 20 20 1055 54% 3 0%
3 NoZone 36 20 20 1050 54% 2 0%
4 Adam 32 20 20 1080 53% 4 0%
5 GuyMannington 31 19 19 1197 53% 2 0%
6 KnA+v 25 19 19 1149 52% 2 0%
7 Grundie 25 20 20 1042 52% 4 0%
8 deweylikedonuts 24 20 20 1098 52% 4 0%
9 TheImpaller 22 21 21 1003 52% 4 0%
10 FBGMoDogg 15 20 20 1104 51% 3 0%
11 PoopSandwich 15 19 19 1171 52% 2 0%
12 Doushibag 12 20 20 1090 51% 4 0%
13 Ragingpikey 0 20 20 1072 50% 5 0%
14 Shiver -1 21 21 991 50% 3 0%
15 Waya -5 21 21 1021 49% 5 0%
16 FBGDragons -5 20 20 1081 49% 1 0%
17 chas -11 20 20 1045 49% 2 0%
18 Soyrice -11 20 20 1117 49% 1 0%
19 3A6L3BA5T -11 20 20 1035 48% 4 0%
20 Perrin3088 -12 19 19 1164 48% 4 0%
21 sue -14 20 20 1107 48% 5 0%
22 devilnis -15 20 20 1059 48% 2 0%
23 Alcarmacil -19 21 21 1008 47% 3 0%
24 crafty35a -22 21 21 1029 47% 2 0%
25 Fizzer -24 20 20 1083 47% 3 0%
26 iI,IñsI,IælikIæ¥?Iñndy -30 21 21 996 46% 3 0%
27 Knoebber -38 20 20 1035 46% 2 0%
28 BluePrecision -39 21 21 1029 46% 0 0%
29 new -69 190 190 10 40% 4 0%
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 08:39:27


Perrin3088 
Level 44
Report
and those turned out ugly :/
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 13:29:17


NoZone 
Level 6
Report
One thing to remember is that it has only been 48 hours. Let it run for a few days and see how the rankings settle down. Once there is a decent number of players with a few games completed, there won't be such wild swings.

That said, it is interesting to see how the nuts and bolts of the ELO ranking are made. I hadn't really thought about the massive recalculation that would need to be performed if you do adjust for previous games. Seems like that would be prohibitive enough to only use very recent games. How soon till that eats up your processing power?

NoZone
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 14:29:53


crafty35a 
Level 3
Report
While I of course agree that the ratings will settle down after a while and begin to stabilize, I think the more interesting discussion is whether the retroactive adjustments make sense at all. As I mentioned briefly in my last post, Bayesian Elo was designed to measure performance between computer chess programs, which have a constant strength level. While the current method will eventually get us close to reasonable ratings, I don't think it makes sense for human players, long-term.

Some benefits to a more standard Elo system, in my opinion:
- It is much more intuitive. You win, you gain rating points. You lose, you lose rating points. The amount is depending on the rating spread between the players.
- Since the calculation is then only between the two people involved in the game, I would think it would be possible to immediately update ratings when games complete, which would be a nice touch.
- No retroactive adjustments. If I beat a new WL player, and a month down the line he becomes a top player, there's no reason I should be rewarded for beating him as if he was a pro when we played.
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 15:31:36


NoZone 
Level 6
Report
Crafty35a,
I agree that the standard ELO makes much more sense. Possibly this was mentioned earlier, but what was the rationale for selecting the one currently in use?
NoZone
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 17:13:38

Dragons 
Level 56
Report
I don't have a problem with past results having an impact on future games and I understand that it will take a week or two for everything to sort itself out.

With that said, in no way should someone who wins a game lose points or someone who loses a game win points. Dropping 200 points by beating someone who has been beaten by others is wrong. If that is an actual possibility with this system (and not a bug), I don't care how quickly everything will sort itself out, the system needs to be tweaked or scrapped.
Posts 1 - 30 of 50   1  2  Next >>   
Discussion is locked - replying not allowed