<< Back to Ladder Forum   Search

Posts 11 - 30 of 43   <<Prev   1  2  3  Next >>   
VOTE: Should the ladders switch to a traditional ELO model?: 3/21/2011 16:02:17


crafty35a 
Level 3
Report
Since people don't seem to mind the quirks of the Bayesian approach (ratings that chance even when you are not playing, strange looking results with few games completed), I think the obvious solution is to replace Bayeselo with a similar system that is tailored to players of time-variable strength. This would remove the need to limit rated games to the last three months only, while still not penalizing players who have improved their play.

There are two systems that I think would work best:

Whole-History Rating:

- This one has been discussed before. It is similar to Bayeselo (actually created by the same person), but is designed for human players with varying levels of strength, which makes it a perfect choice. The only downside? There is no readily available tool to calculate the ratings, so we would need to either create one ourselves, or acquire one from someone who has already done the work. The Arimaa (a chess-like game) community currently uses WHR to calculate its ratings. In the thread discussing the rating system, the user who created the tool has declined to release his software as open source, but perhaps we could contact them and ask if they would be willing to share a tool that we could use? http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=talk;action=display;num=1207699394;start=105

- I have also been in contact with someone who is working on a java implementation of WHR for use in a chess rating competition ( http://www.kaggle.com/ChessRatings2 ). He has indicated a real interest in Warlight and a willingness to share his tool when the competition is over. I will invite him to contribute to this thread if he so desires.

TrueSkill Through Time:

- TrueSkill has been briefly discussed. TrueSkill Through Time is, in my opinion a big improvement. Results would likely be quite similar to WHR. One big plus? The source code has been freely released by Microsoft ( http://blogs.technet.com/b/apg/archive/2008/04/05/trueskill-through-time.aspx )! The downside? The released code will only compile on older versions of F#, which I do not have access to. If anyone does, or if anyone is an F# programmer, it probably would not be too difficult to make the necessary changes to get it to compile on a current version of F#.

Ultimately, I think either of these systems would be preferable to trying to modify Bayeselo with a continuously decreasing function.
VOTE: Should the ladders switch to a traditional ELO model?: 3/21/2011 16:21:00

The Impaller 
Level 9
Report
I like trying out the TrueSkill option. It seems designed for a very similar environment as our own.

I'm not a fan of the current system without changes or improvements. I like it in theory, but in practicality I'm not sure it's the best for the ladder. The current system is almost such that you don't work your way to the top with wins, but rather with avoiding losses, which encourages slower play and loss dodging. With a more standard ELO system (and True Skill seems closer to standard ELO than Bayesian) you work your way to the top by winning a lot. That could encourage people to accept losses and surrender them rather than drag them out because it takes up a game slot that could be used to attempt to achieve a victory.

My vote goes to TrueSkill Through Time.
VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 02:34:43

Eitz 
Level 11
Report
well granted I haven't seen that a lot cuz I'm like Duke in the fact that when I know a game's over and I'm gonna lose, I prefer to tear off the band-aid and move on to the next game asap and the games I've had on here have generally been with pretty respectful people who tend to be cut from a similar cloth. I could definitely see that being an incredible nuisance to have to wait for players who are dodging the loss to save their rating as it's just going to happen sooner or later anyways. If this TrueSkill puts more ownice on winning rather than 'not losing' then I would for sure change my vote to something to that effect.
VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 02:36:32

Eitz 
Level 11
Report
I just also still really like the idea of only keeping a 3 month track record (but I may be asking too much now ;P)
VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 07:18:27

Blue Precision 
Level 32
Report
I voted to keep the system the way it is and here's the main reasons why:

1) I like the fact that your score fluctuates based on how the opponents you've already played perform. This places less emphasis on players early games. In other words, if Ruthless or a player of his similar skill level played poorly out of the gate, say losing 6 of his first 10 matches, this is just coincidental and the standing should reflect this. Yet with True Skill if I was the person who beat him for the 6th time I would get relatively little points even though Ruthless is an exceptional player. The current system recognizes that Ruthless is an exceptional player and just happened to lose more games early then his skill level would eventually indicate. Therefore, why penalize players on the timing of their matches with other players? The ranking already reset and are reflective of 3-month blocks of time so you can't use the argument that players can improve drastically over time.

2) I think the ladder, give or take a few spots in positions, is extremely accurate in its list of players' skill to this point. Not too offend anybody lower down but I truly believe this statement. And just like in sport standings, this system makes it a tough grind to leap frog opponents. This is a great thing in my opinion. If ladder ranking are supposed to give an accurate pecking order then it should be more static then dynamic. The system would be flawed if players jumped around sporadically from day to day based on a few game winning/losing skids. To use myself as an example, it has taken me nearly two weeks of consistently playing well to supplant Ruthless and Troll who held the 12th and 11th spots (I was 13th). I don't see that as a problem but rather as a praiseworthy element of the current statistical mechanism.

3) A lot of what I'm reading is players wanting to use a system where a 5 game winning streak shoots them up the standings. I think this is rubbish. Once a decent sample size has been established by all players participating in the ladder (I have played well over 50 games now) the swings should be slow; people should pass each other at a snails pace if the system is going to be accurate. From what I can tell, that exactly what is happening.

In my opinion the ladder is extremely enjoyable, has pushed people to get better, and it is very accurate. Let's not change it for the sake of change.
VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 11:39:00

TeddyFSB 
Level 60
Report
I am undecided on what I prefer, but I think I would like the system tweaked so that some reasonable number of games is needed to get into top 10. Right now, getting into top 10 is too easy for an above average player -- you don't even have to employ stalling although it can help greatly. You just need a lucky streak to begin with, and all of a sudden there is a new unproven player in top 10, who can stay there a while if they don't play much. This will happen with regularity given the current system.

I don't like the top of the ladder diluted this way, so I don't think top 10 should be reachable before playing 30-40 games. As an added benefit, there will be much less stalling then.

I am leaning towards a penalty of ~1000/sqrt(Ngames) applied to all current scores since it is trivial to implement and works sufficiently well.
VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 13:09:33


crafty35a 
Level 3
Report
The problem with applying an arbitrary penalty to scores on top of the current system is that we would be sacrificing the accuracy of the ratings. If we think it is a big problem to have people in the top 10 with only 10 games completed, then I think the better solution is to increase the length of the provisional period.
VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 13:33:49


Gaia 
Level 25
Report
Now that there’s a restriction on who players can have games with (within 20% of your rating), I’m not so sure we’ll see meteoric rises to the top as before.

I voted to keep the current Bayesian ELO system. I definitely prefer bayeselo over the traditional ELO model because it’s a more accurate representation of skill level, which utilizes more relevant data (the wins/ losses of your previous opponents over the last 3 months), resulting in the dynamism of ratings which constantly reflect this. In a few weeks the stalled games will be complete and we’ll see even more refined ladder rankings. I would like to see this system play out for at least a while longer. Patience! :)

I would be open to modification of the current Bayesian model after the 3 month mark, but have yet to see exactly which modifications would be better suited to the WarLight ladder. The decayed-history system seems to benefit those who play less games (which will be me once I join the ladder), while penalizing those who play more frequently. Alternatively, I’m strongly opposed to the TrueSkill system which would penalize decent players who are not able to play a large amount of games in a shorter time frame. The Whole-History Rating system sounds interesting, but how exactly does it take into account the varying levels of players? I read an abstract which indicates this system to be more accurate in predictions than decayed-history and TrueSkill algorithms, but not sure if it can or should be applied here.
VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 13:40:31

fatguyinalittlecoat 
Level 3
Report
I haven't voted, and I think the current ladder does a pretty good job in terms of accuracy.

My only issue with the ladder as currently constituted is the "Doushi" problem, where players deliberately slow games to a crawl to avoid taking a loss. I'm not talking about someone taking a day to think over a move or playing a few moves past where he should probably surrender. Those things are completely understandable and not a big deal. I'm talking about waiting until just before the boot timer is finished, then making one move (despite the fact that he could make two), then waiting another 3 days to do the same thing. I have a game against Doushibag where he's doing this. I have another game against spikeknights where I suspect he may be doing it as well. I would support any change in the system that creates a disincentive for this strategy.
VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 15:50:49

TeddyFSB 
Level 60
Report
Crafty, 1000/sqrt(Ngames) penalty is not arbitrary, this is roughly what TrueSkill option uses. Uncertainty of your rating is ~ 400/sqrt(Ngames), so if after applying the penalty your score becomes 1800, what this means is that, with 99% probability, your skill is at least 1800. Or we can use 650/sqrt(Ngames), then it will be 95% probability.
VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 16:21:51

bostonfred 
Level 7
Report
I think the rankings work great! This is exactly how things are supposed to work! That game I started with Doushi back on March 6th should make it to turn eight soon, which will be fun! And while I'm here I wanted to congratulate some people on their rankings in the 2v2 ladder! Congratulations to Blue Precision and Eitz (0-2), on their fourth place ranking in the ladder! And congrats should also go to impaller and waya (2-0) for their fifth place ranking! Because the ladder works great right now!
VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 17:29:22


Duke 
Level 5
Report
methinks he might be a wee bit sarcastic in his praise.
VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 17:36:48


Ruthless 
Level 57
Report
technically there are no rankings yet on the 2v2 ladder because we all need to complete 10 games. The first to get 10 games gets 1st place!

So really, those rankings don't say anything yet fred
VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 17:44:00


Doushibag 
Level 17
Report
It does say something... I can hear it.. in my head... it tells me things..
VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 18:53:17


Math Wolf 
Level 64
Report
There is a player who rocketed to the top 10 and is stalling all his losing games... When I played him, he took his turns almost immediately (less than 5 minutes mostly), but meanwhile it was his turn for more than 2 days in at least 3 other games...
And this player only recently has finished 10 games.

So I think it is quite possible to rocket yourself to the top by stalling. This won't change my point of view though, I'm a still a fan of BayesELO (with a continuous decay over time, as mentioned 47 912 times before).
VOTE: Should the ladders switch to a traditional ELO model?: 3/23/2011 06:43:11

Blue Precision 
Level 32
Report
Fred I stated, if you cared to read what I wrote carefully, that the system works great because players will a good sample size of gamesl - like me - have a accurate ratings. You may have skimmed over too quickly my example where a Ruthless would mistakenly be a below average player early on if he so happened to lose 6 of his first 10.

What your sarcastically saying is nothing other than the fact that the system is not accurate with teams that have played two games. No kidding. The teams that beat us at the time were undefeated so technically Eitx and I could theoretically be the 3rd strongest team as he had never lost to another team with a loss. Again with this few games the system simply cannot draw conclusions that are accurate yet. Unless you want a system that is psychic I'm still not sure what the problem is?

Answer me this: Should the combatant ranked 14th who started with say 1700 pts before going on a 5 game win streak against players ranked 11th, 12th, 13th, 27th and 35th, pass the person ranked 9th with 1800pts, who just beat 2nd and 5th but lost to 1st, 7th and 10th? I would argue no and that the 9th ranked person is known to be capable of beating high seed whereas the 14th ranked person cannot be known to be anywhere better than 11th.

My point to all this is that loses shouldn't be the be all end all, neither should wins. It who - specifically - you can beat and who - specifically - you have lost to that counts. I.e. In a ladder rankings are only meaningful if you are ranked in relativity to the others around you and for this you need a decent sample size. I repeat, a decent sample size. All together now...
VOTE: Should the ladders switch to a traditional ELO model?: 3/23/2011 14:35:24


crafty35a 
Level 3
Report
I'd just like to strongly urge anyone who has voted "No - keep using Bayesian ELO" to consider changing their vote to "No - don't switch to traditional but I'd like some other rating system that isn't mentioned here."

I honestly can not think of a single advantage to using Bayesian Elo over either Whole-History Rating or TrueSkill Through Time. We should not have to hack Bayesian Elo by doing things like counting only the last three months of results, or adding a decay function. The improvements provided by these modifications are already **built in** to the other rating systems, *since they do not count all games with equal weight*, regardless of when they occur. That is the fatal flaw of BayesElo, in my opinion. Why try to hack a solution ourselves, when people who specialize in this sort of thing have already created some of the most accurate rating systems in the world have already done the hard work?
VOTE: Should the ladders switch to a traditional ELO model?: 3/23/2011 14:55:02

bostonfred 
Level 7
Report
Blue - I was kidding. Sorry to get your neck up.

Crafty - I agree.
VOTE: Should the ladders switch to a traditional ELO model?: 3/23/2011 16:58:44

Blue Precision 
Level 32
Report
lol, Boston I wasn't aimed at you (personally), I agree with with your points in the main, just didn't think your argument countered mine at all. And, I fear that forums are the same as newspapers, most people when believe what you write not the point your trying to prove or your intensions.

My fear is we change the system and viola, to all our shock its still not Perfect. Then we call cry and moan again how this other system would solve our problems. Some players play slow, some fast, some prefer many games, some few. No system is going to balance all this out and reward everyones preferences equally.

My final comment on this thread is that our time would be better spent to tweak what we have... ala my suggestion of the cumulative boot timer for all active games to cure Doushi's discovered (and used) system exploitation rather than rotating though endless amounts of systems that could all have various ways to exploit them.
VOTE: Should the ladders switch to a traditional ELO model?: 3/23/2011 17:46:12


Duke 
Level 5
Report
Maybe Fizz purposefully uses inferior rating systems to encourage more participation in the forums.
Posts 11 - 30 of 43   <<Prev   1  2  3  Next >>