<< Back to Ladder Forum   Search

Posts 21 - 36 of 36   <<Prev   1  2  
Skewed rating results: 2/24/2011 23:37:34


Perrin3088 
Level 49
Report
that's the point ruthless, it makes presumptions based purely on who you're matched up on..

it takes the fact that b is better then c/d, and assumes that since w was matched to b, but not c/d that w must be better then c/d, and closer to b. until further evidence is provided.


as time would tell when w plays c/d and losses, then it would place them properly...

another downfall i see, is if you manage to get a falsely high ranking, then even though you continue to lose against incredibly high rating scores, you'll drop slowly, but it will still think you deserve a high*er* ranking because you were matched up with higher rating players.

Imho, once it stabilizes some, new players should play the first ten games, as if they had 1500 ratings.. ie, against average players, *preferably average vs new, and not new vs new* so that way they will have 10 w/l's against members of the average difficulty level of the ladder, instead of incidentally getting high/low rating w/l's based mostly on whoever their first match-up was
Skewed rating results: 2/24/2011 23:43:48


Ruthless 
Level 57
Report
Ah, I didn't see that your example was an example to point out the flaws in the system. I thought you were trying to prove the opposite which is how i got confused. That makes sense now.
Skewed rating results: 2/25/2011 00:19:58


Perrin3088 
Level 49
Report
Mathwolf, i agree wholeheatedly with the fictional 1500 opponent for new players.. it is essentially the reason i am pushing for 1500 base rating for a players probationary 10 games, as i couldn't figure out a way, via the program fizzer is currently using, to keep the Rating more stable initially while still letting them keep rating changes for early games before passing the probationary period.
Skewed rating results: 2/25/2011 00:33:08

Fizzer 
Level 64

Warzone Creator
Report
That's an interesting idea, MathWolf. I'll have to experiment with that - thanks!
Skewed rating results: 2/25/2011 02:26:31


NoZone 
Level 6
Report
The 'cheat' proposes by MathWolf looks pretty compelling.

As an aside, what distribution of players ranks should be expected from this ranking system. Won't it be normal? It'd be interesting to know how the distribution differs from the distribution of ranks in a typical ELO scoring system.

NoZone
Skewed rating results: 2/25/2011 02:57:17


crafty35a 
Level 3
Report
One thing I have yet to see in this discussion is anyone arguing that the Bayesian system is actually better/preferable than a more standard Elo system. MathWolf's hack should make things work a bit more smoothly, but taking the long view for a second here: can anyone tell me why we should prefer to use the Bayesian system? I think I've put forward a pretty strong argument that it is inappropriate for human players, and I've really yet to hear anyone argue otherwise.
Skewed rating results: 2/25/2011 03:19:52

The Impaller 
Level 9
Report
I don't see it being necessarily worse. I want to let it run its course for a while. If that doesn't work, then I'm sure Randy will switch to something else, but I don't think people are giving it enough of a chance and are condemning it too fast because it's something they are unfamiliar with or something that doesn't necessarily make immediate intuitive sense.

I can think of a few downsides to standard ELO systems. It can be advantageous to play tons of games to inflate or push your rating really high in some situations and in other situations it can be advantageous to play as few games as you possibly can to prevent losing rating points. Standard ELO systems allow people to inflate their rating by grinding a ton of games until they can win enough in a row, and then sit on that rating by playing as few games as they can. In a competitive game I play that uses an ELO system, this is commonly done, because certain rating levels award you byes or invites to various events. So you can grind a bunch of events until you win enough to get above a certain threshold and then just sit on that rating long enough to get whatever invite or bye you are looking to get from it.

I don't think the Bayesian system has that kind of downside, because it doesn't matter when your wins come, or what order they come in. In the Bayesian system, I also don't think you will have to play a lot of games in order to reach a high enough rating, whereas in a standard ELO system, you may have to play 20+ games to even be in the situation where you have a shot of getting to the top of rating, because rating gain is a much slower process, since there are a fixed number of points you can win.

I think the Bayesian system has a lot of potential. I was down on it at first because it was unfamiliar and was producing really awkward results with low data, but I think with more and more data being pumped in we're going to see very smooth rankings that fairly accurately reflect the skill of the players in the ladder. Lets give it a shot.
Skewed rating results: 2/25/2011 03:50:03

Fizzer 
Level 64

Warzone Creator
Report
I share Impaller's feelings.

The thing that I hate about standard ELO is how someone's current rating affects you a lot, even if they aren't properly rated yet. For example, say Gaia joins the tournament tomorrow and defeats you. Gaia is a very strong player, but her rating was only 1500 since she just joined, so you take a big hit for losing to a 1500 player.

Now say Gaia rises to the top 5 - does it really make sense for everyone she beat on the way there to take a big penalty to their rating? The Bayesian system would return those points to them as she went up the ladder, since it recognized that they really lost to a stronger player.

The argument that player's skills change over time is valid, but that's solved in the long term by games expiring after 3 months. If you're better next year than you are now, the ladder will reflect what your skill is next year - today's games won't affect it at all.

I agree, however, that it's flawed if your skills change rapidly, such as over a few days. The ladder really isn't designed for complete newbies - it's designed to be the ultimate competitive arena. If you're still developing your skills, I recommend playing some practice games. The 1v1 auto games have been around for a long time and have given out a ton of practice to lots of players on these settings.
Skewed rating results: 2/25/2011 05:25:08


crafty35a 
Level 3
Report
"The thing that I hate about standard ELO is how someone's current rating affects you a lot, even if they aren't properly rated yet. For example, say Gaia joins the tournament tomorrow and defeats you. Gaia is a very strong player, but her rating was only 1500 since she just joined, so you take a big hit for losing to a 1500 player."

There are some pretty simple ways around that, though. Here is one that I like: new players have a provisional rating until they complete X number of games. While provisionally rated, the new player's rating changes, but their opponents' ratings do not change. Once they are out of the provisional period, the expectation is that their rating will be close to their "true" rating. Since they will be expected to maintain a similar rating to their first true rating, there is no inflation/deflation introduced to the system.

I believe the USCF uses something very similar to this, currently, but it's been a long time since I played so I will have to do some research to verify that.
Skewed rating results: 2/25/2011 05:37:57


crafty35a 
Level 3
Report
Actually, I think I just thought of a nice addition to the system I describe above:
- Player exits the provisional period (completes X number of games)
- At this point, you have a good estimate of his true rating. So now you can adjust his past opponents' ratings accordingly.
Skewed rating results: 2/25/2011 05:50:24


Perrin3088 
Level 49
Report
I think crafty's been reading my posts ;)
Skewed rating results: 2/25/2011 21:17:54


Math Wolf 
Level 64
Report
I'm not very acquainted with normal ELO, nor do I know more about Bayeselo than what is written on their site.

What I do know for sure, is that the 'hack' (I'd rather call it solution) I propose doesn't turn the Bayeselo into an ELO, it is still a Bayeselo, just with 4 extra, fictional games for every player.
The main difference as far as I understand is that Bayeselo keeps track of your previous opponents while ELO does not. Not coincidentally, Bayesian statistics are very strong in dealing with ever-changing data. It is known that the prior distribution is the Achilles heel of Bayesian statistics and it therefore no coincidence either that this is exactly what causes the main (and only?) problems here.

My personal view is that Bayeselo is better than normal ELO and should be preferred.
The only possible improvement over the current Bayeselo, other than changing the prior, would be if it was possible to reweigh the results as a function of time not with a discrete cut-off as is done now (complete counts during 3 months, doesn't count after), but with a continuously decreasing function of time. Technically, I'm sure this is possible, but it would most likely slow down the algorithm considerably.
The result would then be more accurate as more recent results have a higher weight than results slightly longer ago, and so on.
Skewed rating results: 2/25/2011 21:39:08


crafty35a 
Level 3
Report
MathWolf, right, your solution does not turn Bayeselo into Elo, I was only implying that it will make it behave more similarly in the early phases of rating.

As to your last paragraph as to where you see how Bayeselo can be improved, you may be interested in this link: http://remi.coulom.free.fr/WHR/

It is a paper describing a new rating system, designed by the same person as the Bayeselo system currently in use for Warlight. The paper is title titled "Whole-History Rating: A Bayesian Rating System for Players of Time-Varying Strength" (in my opinion this is just about the same as saying "A Bayesian Rating System for Humans, not Machines"). Unfortunately, I don't think there is currently a downloadable application for this newer rating system, but maybe I will email the author to ask. I'd be glad to take a stab at programming a small application to run the calculations with this algorithm, but my math skills are frankly not sharp nowadays. I'll have to read through the paper and see how explicitly the system is spelled out, to determine how feasible that would be.
Skewed rating results: 2/26/2011 10:08:24


Math Wolf 
Level 64
Report
That's a great paper crafty35a, it gives exactly the solution for the possible improvement I was discussing.
As it is an extension of bayeselo, I think the author surely has (or should have) good working code on it, just not put into a downloadable application yet.
Most likely, if WL would want to use it, he'd provide the code as it is a direct application of his work and he may use the resulting data of WL then in further research / publications.
I read in that paper they used only one fictional win and loss as prior, but I didn't immediately see a reasoning why. It may be arbitrarily chosen.

The only thing I don't agree with in the paper is that they recalculate the ranking with only one iteration every time a player plays a game. It would be more cost-effective and accurate to do a full iterative process (up to 20 iterations maximum should be certainly enough for this kind of problems) every 2 hours as is also done now on WL I think.
Skewed rating results: 2/26/2011 18:16:36


crafty35a 
Level 3
Report
I contacted Mr. Colon to inquire about the availability of a downloadable tool to calculate WHR (Whole-History Rating). Here is his response:

"Hi,

I have no publicly available version of WHR, sorry.

I agree that WHR is more appropriate than bayeselo for rating players whose strength varies in time.

I know the Arimaa community implemented WHR for their rating system. Maybe they can share there code.

Rémi"


I had previously found a discussion thread discussing the implementation of WHR for Arimaa. Unfortunately, someone else asked the user who created the code if the source was available, and was refused. I will probably try to contact him anyways, to see if there is any possibility of acquiring the tool (perhaps a compiled version, rather than the actual source).

The only other option I see would be to custom code a tool. I would be more than happy to do this, but my math skills are so bad that it will probably take me ages just to understand what needs to be done. The coding itself shouldn't be a problem once I wrap my head around the calculation. If there are any math guys out there who would be willing to help explain things to me, I will try to write a small application to output ratings. MathWolf, I would ask you to do so but I know you alluded to being busy in another thread, so consider this an open invitation to the mathematically inclined WL players.
Skewed rating results: 2/26/2011 18:46:44


Math Wolf 
Level 64
Report
I think if you can get the complete Bayeselo code, that adding the history part shouldn't be very difficult, although it keeps surprising me often how easy looking things can results in weeks of coding.

From what I understand, the only part that needs to be added, is the weigh function, which is in its form very simple (exponentially decreasing with a chosen parameter, in the paper 400 days). If this isn't too difficult, I can spend some time on it for sure, as long as I don't have to do the coding myself (I can quite easily read code in most languages, but I'm only skilled in writing statistical codes.)
The mathematical part shouldn't pose problems for me and even if it does, I know enough people who can help out with that.

So if you're interested in doing this, feel free to contact me, crafty. I won't share my email address here, but I'll give the hint that I have a certain yahoo address, that should be enough. :-)
Posts 21 - 36 of 36   <<Prev   1  2