Warzone

<< Back to Ladder Forum

Posts 31 - 36 of 36 <<Prev 1 2

Skewed rating results: 2011-02-25 05:50:24
Perrin3088 Level 49 Report	I think crafty's been reading my posts ;)

Skewed rating results: 2011-02-25 21:17:54

Math Wolf

Level 64
Report

I'm not very acquainted with normal ELO, nor do I know more about Bayeselo than what is written on their site.

What I do know for sure, is that the 'hack' (I'd rather call it solution) I propose doesn't turn the Bayeselo into an ELO, it is still a Bayeselo, just with 4 extra, fictional games for every player.
The main difference as far as I understand is that Bayeselo keeps track of your previous opponents while ELO does not. Not coincidentally, Bayesian statistics are very strong in dealing with ever-changing data. It is known that the prior distribution is the Achilles heel of Bayesian statistics and it therefore no coincidence either that this is exactly what causes the main (and only?) problems here.

My personal view is that Bayeselo is better than normal ELO and should be preferred.
The only possible improvement over the current Bayeselo, other than changing the prior, would be if it was possible to reweigh the results as a function of time not with a discrete cut-off as is done now (complete counts during 3 months, doesn't count after), but with a continuously decreasing function of time. Technically, I'm sure this is possible, but it would most likely slow down the algorithm considerably.
The result would then be more accurate as more recent results have a higher weight than results slightly longer ago, and so on.

Skewed rating results: 2011-02-25 21:39:08

crafty35a

Level 3
Report

MathWolf, right, your solution does not turn Bayeselo into Elo, I was only implying that it will make it behave more similarly in the early phases of rating.

As to your last paragraph as to where you see how Bayeselo can be improved, you may be interested in this link: http://remi.coulom.free.fr/WHR/

It is a paper describing a new rating system, designed by the same person as the Bayeselo system currently in use for Warlight. The paper is title titled "Whole-History Rating: A Bayesian Rating System for Players of Time-Varying Strength" (in my opinion this is just about the same as saying "A Bayesian Rating System for Humans, not Machines"). Unfortunately, I don't think there is currently a downloadable application for this newer rating system, but maybe I will email the author to ask. I'd be glad to take a stab at programming a small application to run the calculations with this algorithm, but my math skills are frankly not sharp nowadays. I'll have to read through the paper and see how explicitly the system is spelled out, to determine how feasible that would be.

Skewed rating results: 2011-02-26 10:08:24

Math Wolf

Level 64
Report

That's a great paper crafty35a, it gives exactly the solution for the possible improvement I was discussing.
As it is an extension of bayeselo, I think the author surely has (or should have) good working code on it, just not put into a downloadable application yet.
Most likely, if WL would want to use it, he'd provide the code as it is a direct application of his work and he may use the resulting data of WL then in further research / publications.
I read in that paper they used only one fictional win and loss as prior, but I didn't immediately see a reasoning why. It may be arbitrarily chosen.

The only thing I don't agree with in the paper is that they recalculate the ranking with only one iteration every time a player plays a game. It would be more cost-effective and accurate to do a full iterative process (up to 20 iterations maximum should be certainly enough for this kind of problems) every 2 hours as is also done now on WL I think.

Skewed rating results: 2011-02-26 18:16:36

crafty35a

Level 3
Report

I contacted Mr. Colon to inquire about the availability of a downloadable tool to calculate WHR (Whole-History Rating). Here is his response:

"Hi,

I have no publicly available version of WHR, sorry.

I agree that WHR is more appropriate than bayeselo for rating players whose strength varies in time.

I know the Arimaa community implemented WHR for their rating system. Maybe they can share there code.

Rémi"

I had previously found a discussion thread discussing the implementation of WHR for Arimaa. Unfortunately, someone else asked the user who created the code if the source was available, and was refused. I will probably try to contact him anyways, to see if there is any possibility of acquiring the tool (perhaps a compiled version, rather than the actual source).

The only other option I see would be to custom code a tool. I would be more than happy to do this, but my math skills are so bad that it will probably take me ages just to understand what needs to be done. The coding itself shouldn't be a problem once I wrap my head around the calculation. If there are any math guys out there who would be willing to help explain things to me, I will try to write a small application to output ratings. MathWolf, I would ask you to do so but I know you alluded to being busy in another thread, so consider this an open invitation to the mathematically inclined WL players.

Skewed rating results: 2011-02-26 18:46:44

Math Wolf

Level 64
Report

I think if you can get the complete Bayeselo code, that adding the history part shouldn't be very difficult, although it keeps surprising me often how easy looking things can results in weeks of coding.

From what I understand, the only part that needs to be added, is the weigh function, which is in its form very simple (exponentially decreasing with a chosen parameter, in the paper 400 days). If this isn't too difficult, I can spend some time on it for sure, as long as I don't have to do the coding myself (I can quite easily read code in most languages, but I'm only skilled in writing statistical codes.)
The mathematical part shouldn't pose problems for me and even if it does, I know enough people who can help out with that.

So if you're interested in doing this, feel free to contact me, crafty. I won't share my email address here, but I'll give the hint that I have a certain yahoo address, that should be enough. :-)

Posts 31 - 36 of 36 <<Prev 1 2

Post a reply to this thread