<< Back to Ladder Forum | Discussion is locked - replying not allowed   Search

Posts 21 - 40 of 50   <<Prev   1  2  3  Next >>   
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 08:00:30


Perrin3088 
Level 49
Report
I'd also like to point out, that only 3 people haev complete 5+ games according to that list, lol...


and i am currently running a test ladder using the same names, and a random number generator to determine w/l's and seeing what comes up atm
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 08:06:14


Perrin3088 
Level 49
Report
hmm.. also check out perhaps instead of using removerare X from resultset> you could change ratings to ratings X.. it would allow the games to still have an affect on the players that actually have had enough games, but not for the actual new players to show up until they have reached the threshold..
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 08:12:26


Perrin3088 
Level 49
Report
hmm, and in extra testing, for some reason using removerare 5 on the current ladder is causign the program to hang up on me..? i thought they fixed mm hanging up when players unconnected back in '05?
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 08:37:33


Perrin3088 
Level 49
Report
hmm.. somehow i never got my test to pass the mm stage in the removerare 5, but one of the times i accidentally did ratings >removerare 5.txt it actually seemed to show up properly, despite not having been mm'd properly..

variances.

ratings 5
Rank Name Elo + - games score oppo. draws
1 Perrin3088 46 258 258 5 60% -17 0%
2 Knoebber -133 179 179 7 43% -94 0%
3 3A6L3BA5T -337 227 227 5 0% -108 0%

removerare 5
Rank Name Elo + - games score oppo. draws
1 Perrin3088 94 282 282 2 100% -47 0%
2 Knoebber 0 271 271 2 50% 0 0%
3 3A6L3BA5T -94 282 282 2 0% 47 0%


using the real data of course.. so not much data.. :/
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 08:39:10


Perrin3088 
Level 49
Report
and i ran my test ladder through 15000 games i think, and then ran a new player through 10 games against the middle of the ladder, and the variances seemed to die down a fair bit around 6-9 games or so...


and just for SnG's, here's the results, *based absolutely only on RNG's, lol*

Rank Name Elo + - games score oppo. draws
1 Ruthless 49 20 20 1103 55% 2 0%
2 CuChulainn 39 20 20 1055 54% 3 0%
3 NoZone 36 20 20 1050 54% 2 0%
4 Adam 32 20 20 1080 53% 4 0%
5 GuyMannington 31 19 19 1197 53% 2 0%
6 KnA+v 25 19 19 1149 52% 2 0%
7 Grundie 25 20 20 1042 52% 4 0%
8 deweylikedonuts 24 20 20 1098 52% 4 0%
9 TheImpaller 22 21 21 1003 52% 4 0%
10 FBGMoDogg 15 20 20 1104 51% 3 0%
11 PoopSandwich 15 19 19 1171 52% 2 0%
12 Doushibag 12 20 20 1090 51% 4 0%
13 Ragingpikey 0 20 20 1072 50% 5 0%
14 Shiver -1 21 21 991 50% 3 0%
15 Waya -5 21 21 1021 49% 5 0%
16 FBGDragons -5 20 20 1081 49% 1 0%
17 chas -11 20 20 1045 49% 2 0%
18 Soyrice -11 20 20 1117 49% 1 0%
19 3A6L3BA5T -11 20 20 1035 48% 4 0%
20 Perrin3088 -12 19 19 1164 48% 4 0%
21 sue -14 20 20 1107 48% 5 0%
22 devilnis -15 20 20 1059 48% 2 0%
23 Alcarmacil -19 21 21 1008 47% 3 0%
24 crafty35a -22 21 21 1029 47% 2 0%
25 Fizzer -24 20 20 1083 47% 3 0%
26 iI,IñsI,IælikIæ¥?Iñndy -30 21 21 996 46% 3 0%
27 Knoebber -38 20 20 1035 46% 2 0%
28 BluePrecision -39 21 21 1029 46% 0 0%
29 new -69 190 190 10 40% 4 0%
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 08:39:27


Perrin3088 
Level 49
Report
and those turned out ugly :/
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 13:29:17


NoZone 
Level 6
Report
One thing to remember is that it has only been 48 hours. Let it run for a few days and see how the rankings settle down. Once there is a decent number of players with a few games completed, there won't be such wild swings.

That said, it is interesting to see how the nuts and bolts of the ELO ranking are made. I hadn't really thought about the massive recalculation that would need to be performed if you do adjust for previous games. Seems like that would be prohibitive enough to only use very recent games. How soon till that eats up your processing power?

NoZone
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 14:29:53


crafty35a 
Level 3
Report
While I of course agree that the ratings will settle down after a while and begin to stabilize, I think the more interesting discussion is whether the retroactive adjustments make sense at all. As I mentioned briefly in my last post, Bayesian Elo was designed to measure performance between computer chess programs, which have a constant strength level. While the current method will eventually get us close to reasonable ratings, I don't think it makes sense for human players, long-term.

Some benefits to a more standard Elo system, in my opinion:
- It is much more intuitive. You win, you gain rating points. You lose, you lose rating points. The amount is depending on the rating spread between the players.
- Since the calculation is then only between the two people involved in the game, I would think it would be possible to immediately update ratings when games complete, which would be a nice touch.
- No retroactive adjustments. If I beat a new WL player, and a month down the line he becomes a top player, there's no reason I should be rewarded for beating him as if he was a pro when we played.
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 15:31:36


NoZone 
Level 6
Report
Crafty35a,
I agree that the standard ELO makes much more sense. Possibly this was mentioned earlier, but what was the rationale for selecting the one currently in use?
NoZone
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 17:13:38

Dragons 
Level 56
Report
I don't have a problem with past results having an impact on future games and I understand that it will take a week or two for everything to sort itself out.

With that said, in no way should someone who wins a game lose points or someone who loses a game win points. Dropping 200 points by beating someone who has been beaten by others is wrong. If that is an actual possibility with this system (and not a bug), I don't care how quickly everything will sort itself out, the system needs to be tweaked or scrapped.
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 20:58:45

The Impaller 
Level 9
Report
Maybe this is self-correcting, but the system does seem to weight early wins a lot more than later wins. I didn't start playing in the ladder immediately upon inception, so it was later when I first finished some games. At this point, though, I'm 6-0, but my 6 wins are good for a 1558 rating. On the other hand, really early ladder wins, like Waya's 1-0 record, or NoZone's 2-3 have them a good deal higher.

Is there a log of all the commands that get entered into the system to generate the rankings? I am curious if switching the order on results affects the final result.
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 21:12:43


Perrin3088 
Level 49
Report
Imp.. the reasoning afaik is cause the early wins are all against everyone.. so Nozones 2-3, the first loss was against fizzer, who won 2-3 the first day, which jumped them both up, then nozone was playing against higher difficulty players, which keeps him at that range with a more modest record

Waya, beat nozone who has that inflated record, thus jumping his record up to match accordingly.. the problem as i see it was that the original players moved around so fast because one game was such a large ratio of their ELO that even the losers are put into a bracket that isn't justified by their w/l record..

once everyone gets a dozen games under their belts the issues should diminish.

as well
http://blog.warlight.net/index.php/2011/02/running-your-own-ladder-simulations/
as the logs as well as the program used to do it.


and i checked it recently and there are 4 more wins with first pick then there are normally..
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 21:15:07


crafty35a 
Level 3
Report
The log used to generate the rankings can be found here: http://warlight.net/Data/BayeseloLog.txt

But no, the order doesn't matter with this system. The reason the ratings change more at first is because there is little data, so the program is less certain about the true ratings of players, and makes bigger adjustments to compensate for that fact.
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 21:17:58

Fizzer 
Level 64

Warzone Creator
Report
The order of the wins don't affect the outcomes, and neither does the order you join the ladder or anything like that. You can test this yourself by following the steps in the blog post - it's easy to re-arrange the games and see the affects.
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 21:39:10

The Impaller 
Level 9
Report
It seems like this system may be designed under the idea that everyone is going to play everyone else at least once. I have a feeling that this system will perform very well if that does occur. I'm curious to see how it pans out if that doesn't occur.
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 21:44:34


crafty35a 
Level 3
Report
Impaller, funny that you should say that. Bayesian Elo was designed to rank computer chess AI engines, and the way that typically works is that they play each opponent a set number of times. I think that the system would work better if Fizzer makes ladder games essentially random, instead of trying to pair you up with people near your skill level (ignoring for now what I consider the essential flaw of this system, the retroactive part).
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 21:59:47

The Impaller 
Level 9
Report
I got different results from running the http://warlight.net/Data/BayeseloLog.txt file with different configurations of match results. I ran it first just as it is, and it gave the following output:

Rank Name Elo + - games score oppo. draws
1 Fizzer 440 349 237 4 100% 191 0%
2 Waya 370 496 357 1 100% 235 0%
3 FBGDragons 319 270 230 4 75% 215 0%
4 Doushibag 253 284 221 4 100% 8 0%
5 NoZone 235 195 203 5 40% 272 0%
6 deweylikedonuts 135 279 306 3 33% 218 0%
7 ATrain 123 352 484 1 0% 253 0%
8 Soyrice 118 245 227 3 67% 84 0%
9 Grundie 109 353 356 2 50% 89 0%
10 Ruthless 108 465 298 2 100% -94 0%
11 sue 94 330 330 2 50% 94 0%
12 Perrin3088 75 241 244 6 50% 54 0%
13 PoopSandwich 68 303 236 3 100% -144 0%
14 TheImpaller 58 196 170 4 100% -70 0%
15 GuyMannington 0 324 324 2 50% -12 0%
16 FBGMoDogg -1 343 461 1 0% 118 0%
17 crafty35a -22 271 233 4 75% -147 0%
18 BluePrecision -39 310 279 3 67% -139 0%
19 chas -46 228 217 5 60% -125 0%
20 VampEZSTreet -51 343 461 1 0% 68 0%
21 devilnis -67 351 484 1 0% 58 0%
22 Eitz -67 351 484 1 0% 58 0%
23 BallLightning -73 352 484 1 0% 58 0%
24 Ragingpikey -73 352 484 1 0% 58 0%
25 Knoebber -142 165 168 9 44% -119 0%
26 iI,I±sI,IµlikIµÑ?I±ndy -178 335 335 2 50% -183 0%
27 CuChulainn -201 241 292 4 25% -84 0%
28 Adam -282 218 249 5 20% -101 0%
29 Alcarmacil -284 299 299 2 50% -284 0%
30 Shiver -287 300 300 2 50% -283 0%
31 3A6L3BA5T -343 182 252 8 0% -61 0%
32 KnA+v -348 301 487 2 0% -142 0%

I then ran it reversing the order of all match results. When I say order, I'm referring to when the game was played, the order in the text file. I didn't change, add or remove any actual match results. This was the result I got:

Rank Name Elo + - games score oppo. draws
1 Fizzer 650 345 217 8 100% 277 0%
2 Waya 555 486 303 2 100% 342 0%
3 FBGDragons 469 234 199 8 75% 315 0%
4 Doushibag 408 277 197 8 100% 18 0%
5 NoZone 342 168 176 10 40% 397 0%
6 ATrain 201 300 480 2 0% 408 0%
7 Soyrice 197 220 199 6 67% 140 0%
8 deweylikedonuts 184 250 265 6 33% 317 0%
9 Grundie 154 322 330 4 50% 127 0%
10 Ruthless 146 462 257 4 100% -152 0%
11 PoopSandwich 138 297 214 6 100% -199 0%
12 sue 126 281 281 4 50% 126 0%
13 Perrin3088 114 229 237 12 50% 86 0%
14 TheImpaller 94 190 152 8 100% -109 0%
15 FBGMoDogg 7 294 458 2 0% 197 0%
16 GuyMannington 5 274 275 4 50% -9 0%
17 crafty35a -41 240 204 8 75% -220 0%
18 BluePrecision -44 264 244 6 67% -192 0%
19 VampEZSTreet -52 294 458 2 0% 138 0%
20 chas -89 199 192 10 60% -193 0%
21 devilnis -104 299 480 2 0% 94 0%
22 Eitz -104 299 480 2 0% 94 0%
23 BallLightning -114 300 480 2 0% 94 0%
24 Ragingpikey -114 300 480 2 0% 94 0%
25 Knoebber -215 147 150 18 44% -177 0%
26 iI,I±sI,IµlikIµÑ?I±ndy -268 285 285 4 50% -271 0%
27 CuChulainn -296 205 248 8 25% -129 0%
28 Adam -442 189 216 10 20% -158 0%
29 Alcarmacil -446 249 249 4 50% -445 0%
30 Shiver -449 249 249 4 50% -444 0%
31 3A6L3BA5T -501 168 248 16 0% -88 0%
32 KnA+v -513 256 480 4 0% -215 0%

It's similar, but there are some definite differences. A number of people have changed positions on the ladder. This suggests that the order matches take place does have an effect on the ladder. Maybe this effect goes away later on, but there is one.
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 22:07:41

The Impaller 
Level 9
Report
Also, the 2nd result set has a much wider range of values. Fizzer is ranked 2150 in the 2nd one at first place and he's only 1940 in the first one.
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 22:14:52


Perrin3088 
Level 49
Report
make sure you run reset in the resultset> before you re-run a test
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 22:19:31


Perrin3088 
Level 49
Report
i just ran it through twice, without reset or changing order, and received

Rank Name Elo + - games score oppo. draws
1 Fizzer 650 262 262 8 100% 277 0%
2 Waya 555 344 344 2 100% 342 0%
3 FBGDragons 469 215 215 8 75% 315 0%
4 Doushibag 408 226 226 8 100% 18 0%
5 NoZone 342 174 174 10 40% 397 0%
6 ATrain 201 338 338 2 0% 408 0%
7 Soyrice 197 208 208 6 67% 140 0%
8 deweylikedonuts 184 257 257 6 33% 317 0%
9 Grundie 154 341 341 4 50% 127 0%
10 Ruthless 146 315 315 4 100% -152 0%
11 PoopSandwich 138 242 242 6 100% -199 0%
12 sue 126 276 276 4 50% 126 0%
13 Perrin3088 114 247 247 12 50% 86 0%
14 TheImpaller 94 169 169 8 100% -109 0%
15 FBGMoDogg 7 329 329 2 0% 197 0%
16 GuyMannington 5 268 268 4 50% -9 0%
17 crafty35a -41 223 223 8 75% -220 0%
18 BluePrecision -44 252 252 6 67% -192 0%
19 VampEZSTreet -52 329 329 2 0% 138 0%
20 chas -89 199 199 10 60% -193 0%
21 devilnis -104 338 338 2 0% 94 0%
22 Eitz -104 338 338 2 0% 94 0%
23 BallLightning -114 338 338 2 0% 94 0%
24 Ragingpikey -114 338 338 2 0% 94 0%
25 Knoebber -215 153 153 18 44% -177 0%
26 iI,IñsI,IælikIæ¥?Iñndy -268 280 280 4 50% -271 0%
27 CuChulainn -296 225 225 8 25% -129 0%
28 Adam -442 201 201 10 20% -158 0%
29 Alcarmacil -446 240 240 4 50% -445 0%
30 Shiver -449 241 241 4 50% -444 0%
31 3A6L3BA5T -501 201 201 16 0% -88 0%
32 KnA+v -513 320 320 4 0% -215 0%

notice Fizzer
1 Fizzer 440 349 237 4 100% 191 0%
1 Fizzer 650 262 262 8 100% 277
....................^^^

4 to 8.. means the first example has 4 games, and the second example has 8, indicates you doubled the games involved
Posts 21 - 40 of 50   <<Prev   1  2  3  Next >>   
Discussion is locked - replying not allowed