<< Back to Ladder Forum   Search

Posts 1 - 30 of 43   1  2  Next >>   
VOTE: Should the ladders switch to a traditional ELO model?: 3/20/2011 03:34:07

Fizzer 
Level 58

Warzone Creator
Report
Please vote on whether or not you would like to see WarLight switch from the current Bayesian ELO model to a traditional ELO model.

# [Click here to vote](/Poll.aspx?ID=ELO)

Since this poll pertains to the ladders, only WarLight members may vote. Anyone can view the poll, it just won't accept your vote if you're not a member.

The poll will close on March 27th. You have until then to make up your mind. The voting link above allows you to change your vote at any time before then (simply click the link and vote again and your old vote will be thrown away.)

This poll is not anonymous. If you'd like, you can leave a reply to this thread explaining what you voted and why.
VOTE: Should the ladders switch to a traditional ELO model?: 3/20/2011 04:43:41


Knoebber 
Level 54
Report
I like having my rating fluctuate even if I haven't completed a game recently. So I say keep it.
VOTE: Should the ladders switch to a traditional ELO model?: 3/20/2011 05:35:55


Doushibag 
Level 16
Report
How does the current system relate to this one: http://en.wikipedia.org/wiki/TrueSkill
Same thing? Differences?
Curious as to what the different alternatives are and if there are any systems that are specifically designed for a higher luck game as opposed to a zero luck game like chess. IE Warlight is going to have more 'upsets' due to the luck factor than chess does and if any system can help account for this factor (if its necessary... not sure it is, but seems like a relevant difference).
I don't like the current system, but not sure I'd like standard ELO either and if I didn't like either not sure which I'd prefer at this point or if there was some other alternative to both that would work better. Guess I should go read again why you chose to use the current one over standard ELO. Not sure if it was because you felt standard ELO was notably flawed or just that this one was worth testing out as a possible better alternative.
VOTE: Should the ladders switch to a traditional ELO model?: 3/20/2011 06:16:11

TeddyFSB 
Level 60
Report
Here are the main differences between standard ELO, ELO as used by Warlight and TrueSkill formula:

Standard ELO: you approach mu (your true skill level) in fairly small incremental skills, thus it takes some number of games until you reach mu

Bayesian ELO: mu is evaluated at every step via a maximum likelihood fit, taking all available information into account

TrueSkill: your skill level is determined as mu-n*sigma, so if you are a good player, you take a penalty if you have played a small number of games. At n=3, as used by the Xbox system, the penalty is so huge, I think it would be impractical for Warlight, however a smaller n could be considered as a solution against stalling lost games.

To demonstrate the effect of n=3, if used right now, it would result in Doushibag dropping from 1st to 10th. Shogun, who has played 12 games, would drop from 6th to 32nd. Top 5 would be (listing old ranking and number of games played):

1.TheImpaller (3,65)
2.bostonfred (7,50)
3.Duke (2,26)
4.TeddyFSB (5,40)
5.Ruthless (14,60)
VOTE: Should the ladders switch to a traditional ELO model?: 3/20/2011 11:19:31

bostonfred 
Level 7
Report
TrueSkill: .... a solution against stalling lost games.

my vote
VOTE: Should the ladders switch to a traditional ELO model?: 3/20/2011 12:58:35

TeddyFSB 
Level 60
Report
My apologies, I made a mistake for TrueSkill ratings. I forgot that the errors given by the BayesELO program are actually 2*sigma and not sigma, so I recalculated ratings as if n was equal to 6.

In fact, mu-3*sigma actually gives reasonable results, with Doushibag only dropping to 2nd place, which is exactly where he should be :) Shogun drops to 16th.

If you want to play with numbers without running the program, 1 sigma is approximately equal to 400/sqrt(Ngames). So penalty would be ~300 points at 16 games, 150 points at 64 games, and so on.
VOTE: Should the ladders switch to a traditional ELO model?: 3/20/2011 13:14:49


Math Wolf 
Level 63
Report
I'm a fan of BayesELO, but (as mentioned several times before) with a continuously decreasing function added. I do strongly suggest this change. If the code is adaptable, this change can be easily made.

To have a stabilisation for a small number of games played and to get already a ranking after few games, a number of fictional games against ELO 1500 can be added (as I've mentioned before too).
If you want a real penalty for a smaller number of games, similar to true skill, you can add games against a fictional ELO 1000 or even lower for example.
VOTE: Should the ladders switch to a traditional ELO model?: 3/20/2011 18:03:47


Polaris 
Level 55
Report
I'd personally prefer traditional. I'd like to be rewarded or penalized for my wins/losses, and have it remain at that. No continuous effects~
VOTE: Should the ladders switch to a traditional ELO model?: 3/21/2011 04:57:48

Eitz 
Level 11
Report
I like the current system for the fact that it represents overall skill as a whole and encourages you to play at a consistent level. I also like the fact that it only keeps 3 months of data as it's common knowledge that the more ladder games people take part in, the more the level of play is going to increase and it's nice not having to be penalized for a potential rough start while feeling out the ladder system early on.
VOTE: Should the ladders switch to a traditional ELO model?: 3/21/2011 14:57:20


chas 
Level 43
Report
As previously stated in other posts, I'm with MathWolf. I really like the BayesELO, but would like to see a continuously decreasing function added.
VOTE: Should the ladders switch to a traditional ELO model?: 3/21/2011 16:02:17


crafty35a 
Level 3
Report
Since people don't seem to mind the quirks of the Bayesian approach (ratings that chance even when you are not playing, strange looking results with few games completed), I think the obvious solution is to replace Bayeselo with a similar system that is tailored to players of time-variable strength. This would remove the need to limit rated games to the last three months only, while still not penalizing players who have improved their play.

There are two systems that I think would work best:

Whole-History Rating:

- This one has been discussed before. It is similar to Bayeselo (actually created by the same person), but is designed for human players with varying levels of strength, which makes it a perfect choice. The only downside? There is no readily available tool to calculate the ratings, so we would need to either create one ourselves, or acquire one from someone who has already done the work. The Arimaa (a chess-like game) community currently uses WHR to calculate its ratings. In the thread discussing the rating system, the user who created the tool has declined to release his software as open source, but perhaps we could contact them and ask if they would be willing to share a tool that we could use? http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=talk;action=display;num=1207699394;start=105

- I have also been in contact with someone who is working on a java implementation of WHR for use in a chess rating competition ( http://www.kaggle.com/ChessRatings2 ). He has indicated a real interest in Warlight and a willingness to share his tool when the competition is over. I will invite him to contribute to this thread if he so desires.

TrueSkill Through Time:

- TrueSkill has been briefly discussed. TrueSkill Through Time is, in my opinion a big improvement. Results would likely be quite similar to WHR. One big plus? The source code has been freely released by Microsoft ( http://blogs.technet.com/b/apg/archive/2008/04/05/trueskill-through-time.aspx )! The downside? The released code will only compile on older versions of F#, which I do not have access to. If anyone does, or if anyone is an F# programmer, it probably would not be too difficult to make the necessary changes to get it to compile on a current version of F#.

Ultimately, I think either of these systems would be preferable to trying to modify Bayeselo with a continuously decreasing function.
VOTE: Should the ladders switch to a traditional ELO model?: 3/21/2011 16:21:00

The Impaller 
Level 9
Report
I like trying out the TrueSkill option. It seems designed for a very similar environment as our own.

I'm not a fan of the current system without changes or improvements. I like it in theory, but in practicality I'm not sure it's the best for the ladder. The current system is almost such that you don't work your way to the top with wins, but rather with avoiding losses, which encourages slower play and loss dodging. With a more standard ELO system (and True Skill seems closer to standard ELO than Bayesian) you work your way to the top by winning a lot. That could encourage people to accept losses and surrender them rather than drag them out because it takes up a game slot that could be used to attempt to achieve a victory.

My vote goes to TrueSkill Through Time.
VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 02:34:43

Eitz 
Level 11
Report
well granted I haven't seen that a lot cuz I'm like Duke in the fact that when I know a game's over and I'm gonna lose, I prefer to tear off the band-aid and move on to the next game asap and the games I've had on here have generally been with pretty respectful people who tend to be cut from a similar cloth. I could definitely see that being an incredible nuisance to have to wait for players who are dodging the loss to save their rating as it's just going to happen sooner or later anyways. If this TrueSkill puts more ownice on winning rather than 'not losing' then I would for sure change my vote to something to that effect.
VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 02:36:32

Eitz 
Level 11
Report
I just also still really like the idea of only keeping a 3 month track record (but I may be asking too much now ;P)
VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 07:18:27

Blue Precision 
Level 32
Report
I voted to keep the system the way it is and here's the main reasons why:

1) I like the fact that your score fluctuates based on how the opponents you've already played perform. This places less emphasis on players early games. In other words, if Ruthless or a player of his similar skill level played poorly out of the gate, say losing 6 of his first 10 matches, this is just coincidental and the standing should reflect this. Yet with True Skill if I was the person who beat him for the 6th time I would get relatively little points even though Ruthless is an exceptional player. The current system recognizes that Ruthless is an exceptional player and just happened to lose more games early then his skill level would eventually indicate. Therefore, why penalize players on the timing of their matches with other players? The ranking already reset and are reflective of 3-month blocks of time so you can't use the argument that players can improve drastically over time.

2) I think the ladder, give or take a few spots in positions, is extremely accurate in its list of players' skill to this point. Not too offend anybody lower down but I truly believe this statement. And just like in sport standings, this system makes it a tough grind to leap frog opponents. This is a great thing in my opinion. If ladder ranking are supposed to give an accurate pecking order then it should be more static then dynamic. The system would be flawed if players jumped around sporadically from day to day based on a few game winning/losing skids. To use myself as an example, it has taken me nearly two weeks of consistently playing well to supplant Ruthless and Troll who held the 12th and 11th spots (I was 13th). I don't see that as a problem but rather as a praiseworthy element of the current statistical mechanism.

3) A lot of what I'm reading is players wanting to use a system where a 5 game winning streak shoots them up the standings. I think this is rubbish. Once a decent sample size has been established by all players participating in the ladder (I have played well over 50 games now) the swings should be slow; people should pass each other at a snails pace if the system is going to be accurate. From what I can tell, that exactly what is happening.

In my opinion the ladder is extremely enjoyable, has pushed people to get better, and it is very accurate. Let's not change it for the sake of change.
VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 11:39:00

TeddyFSB 
Level 60
Report
I am undecided on what I prefer, but I think I would like the system tweaked so that some reasonable number of games is needed to get into top 10. Right now, getting into top 10 is too easy for an above average player -- you don't even have to employ stalling although it can help greatly. You just need a lucky streak to begin with, and all of a sudden there is a new unproven player in top 10, who can stay there a while if they don't play much. This will happen with regularity given the current system.

I don't like the top of the ladder diluted this way, so I don't think top 10 should be reachable before playing 30-40 games. As an added benefit, there will be much less stalling then.

I am leaning towards a penalty of ~1000/sqrt(Ngames) applied to all current scores since it is trivial to implement and works sufficiently well.
VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 13:09:33


crafty35a 
Level 3
Report
The problem with applying an arbitrary penalty to scores on top of the current system is that we would be sacrificing the accuracy of the ratings. If we think it is a big problem to have people in the top 10 with only 10 games completed, then I think the better solution is to increase the length of the provisional period.
VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 13:33:49


Gaia 
Level 25
Report
Now that there’s a restriction on who players can have games with (within 20% of your rating), I’m not so sure we’ll see meteoric rises to the top as before.

I voted to keep the current Bayesian ELO system. I definitely prefer bayeselo over the traditional ELO model because it’s a more accurate representation of skill level, which utilizes more relevant data (the wins/ losses of your previous opponents over the last 3 months), resulting in the dynamism of ratings which constantly reflect this. In a few weeks the stalled games will be complete and we’ll see even more refined ladder rankings. I would like to see this system play out for at least a while longer. Patience! :)

I would be open to modification of the current Bayesian model after the 3 month mark, but have yet to see exactly which modifications would be better suited to the WarLight ladder. The decayed-history system seems to benefit those who play less games (which will be me once I join the ladder), while penalizing those who play more frequently. Alternatively, I’m strongly opposed to the TrueSkill system which would penalize decent players who are not able to play a large amount of games in a shorter time frame. The Whole-History Rating system sounds interesting, but how exactly does it take into account the varying levels of players? I read an abstract which indicates this system to be more accurate in predictions than decayed-history and TrueSkill algorithms, but not sure if it can or should be applied here.
VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 13:40:31

fatguyinalittlecoat 
Level 3
Report
I haven't voted, and I think the current ladder does a pretty good job in terms of accuracy.

My only issue with the ladder as currently constituted is the "Doushi" problem, where players deliberately slow games to a crawl to avoid taking a loss. I'm not talking about someone taking a day to think over a move or playing a few moves past where he should probably surrender. Those things are completely understandable and not a big deal. I'm talking about waiting until just before the boot timer is finished, then making one move (despite the fact that he could make two), then waiting another 3 days to do the same thing. I have a game against Doushibag where he's doing this. I have another game against spikeknights where I suspect he may be doing it as well. I would support any change in the system that creates a disincentive for this strategy.
VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 15:50:49

TeddyFSB 
Level 60
Report
Crafty, 1000/sqrt(Ngames) penalty is not arbitrary, this is roughly what TrueSkill option uses. Uncertainty of your rating is ~ 400/sqrt(Ngames), so if after applying the penalty your score becomes 1800, what this means is that, with 99% probability, your skill is at least 1800. Or we can use 650/sqrt(Ngames), then it will be 95% probability.
VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 16:21:51

bostonfred 
Level 7
Report
I think the rankings work great! This is exactly how things are supposed to work! That game I started with Doushi back on March 6th should make it to turn eight soon, which will be fun! And while I'm here I wanted to congratulate some people on their rankings in the 2v2 ladder! Congratulations to Blue Precision and Eitz (0-2), on their fourth place ranking in the ladder! And congrats should also go to impaller and waya (2-0) for their fifth place ranking! Because the ladder works great right now!
VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 17:29:22


Duke 
Level 5
Report
methinks he might be a wee bit sarcastic in his praise.
VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 17:36:48


Ruthless 
Level 36
Report
technically there are no rankings yet on the 2v2 ladder because we all need to complete 10 games. The first to get 10 games gets 1st place!

So really, those rankings don't say anything yet fred
VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 17:44:00


Doushibag 
Level 16
Report
It does say something... I can hear it.. in my head... it tells me things..
VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 18:53:17


Math Wolf 
Level 63
Report
There is a player who rocketed to the top 10 and is stalling all his losing games... When I played him, he took his turns almost immediately (less than 5 minutes mostly), but meanwhile it was his turn for more than 2 days in at least 3 other games...
And this player only recently has finished 10 games.

So I think it is quite possible to rocket yourself to the top by stalling. This won't change my point of view though, I'm a still a fan of BayesELO (with a continuous decay over time, as mentioned 47 912 times before).
VOTE: Should the ladders switch to a traditional ELO model?: 3/23/2011 06:43:11

Blue Precision 
Level 32
Report
Fred I stated, if you cared to read what I wrote carefully, that the system works great because players will a good sample size of gamesl - like me - have a accurate ratings. You may have skimmed over too quickly my example where a Ruthless would mistakenly be a below average player early on if he so happened to lose 6 of his first 10.

What your sarcastically saying is nothing other than the fact that the system is not accurate with teams that have played two games. No kidding. The teams that beat us at the time were undefeated so technically Eitx and I could theoretically be the 3rd strongest team as he had never lost to another team with a loss. Again with this few games the system simply cannot draw conclusions that are accurate yet. Unless you want a system that is psychic I'm still not sure what the problem is?

Answer me this: Should the combatant ranked 14th who started with say 1700 pts before going on a 5 game win streak against players ranked 11th, 12th, 13th, 27th and 35th, pass the person ranked 9th with 1800pts, who just beat 2nd and 5th but lost to 1st, 7th and 10th? I would argue no and that the 9th ranked person is known to be capable of beating high seed whereas the 14th ranked person cannot be known to be anywhere better than 11th.

My point to all this is that loses shouldn't be the be all end all, neither should wins. It who - specifically - you can beat and who - specifically - you have lost to that counts. I.e. In a ladder rankings are only meaningful if you are ranked in relativity to the others around you and for this you need a decent sample size. I repeat, a decent sample size. All together now...
VOTE: Should the ladders switch to a traditional ELO model?: 3/23/2011 14:35:24


crafty35a 
Level 3
Report
I'd just like to strongly urge anyone who has voted "No - keep using Bayesian ELO" to consider changing their vote to "No - don't switch to traditional but I'd like some other rating system that isn't mentioned here."

I honestly can not think of a single advantage to using Bayesian Elo over either Whole-History Rating or TrueSkill Through Time. We should not have to hack Bayesian Elo by doing things like counting only the last three months of results, or adding a decay function. The improvements provided by these modifications are already **built in** to the other rating systems, *since they do not count all games with equal weight*, regardless of when they occur. That is the fatal flaw of BayesElo, in my opinion. Why try to hack a solution ourselves, when people who specialize in this sort of thing have already created some of the most accurate rating systems in the world have already done the hard work?
VOTE: Should the ladders switch to a traditional ELO model?: 3/23/2011 14:55:02

bostonfred 
Level 7
Report
Blue - I was kidding. Sorry to get your neck up.

Crafty - I agree.
VOTE: Should the ladders switch to a traditional ELO model?: 3/23/2011 16:58:44

Blue Precision 
Level 32
Report
lol, Boston I wasn't aimed at you (personally), I agree with with your points in the main, just didn't think your argument countered mine at all. And, I fear that forums are the same as newspapers, most people when believe what you write not the point your trying to prove or your intensions.

My fear is we change the system and viola, to all our shock its still not Perfect. Then we call cry and moan again how this other system would solve our problems. Some players play slow, some fast, some prefer many games, some few. No system is going to balance all this out and reward everyones preferences equally.

My final comment on this thread is that our time would be better spent to tweak what we have... ala my suggestion of the cumulative boot timer for all active games to cure Doushi's discovered (and used) system exploitation rather than rotating though endless amounts of systems that could all have various ways to exploit them.
VOTE: Should the ladders switch to a traditional ELO model?: 3/23/2011 17:46:12


Duke 
Level 5
Report
Maybe Fizz purposefully uses inferior rating systems to encourage more participation in the forums.
Posts 1 - 30 of 43   1  2  Next >>