You get paired up with the lowest ranked guys. If you win all games then the question is whether the ladder sees you as an unstoppable killing machine or as an average guy who manages to beat the AI 100 times in a row. I'm not quite into math as Farah, however I have looked over the whitepaper of the Bayesian ELO algorithm a while ago and remember that they claim to work better than the ordinary ELO algorithm by not seeing you as that unstoppable killing machine in this case.

That is interesting. The claim that they work better than the ordinary ELO algorithm by not seeing you as that unstoppable killing machine is easily falsified on a small scale. Regular ELO takes your game result and calculates a new rating. The increase or decrease of that new rating is limited by a so-called k-factor. Usually, this is 32 or 16. Example:

Assume a base rating of 1500. You beat the best player on the ladder (for argument's sake, he's rated 3000). You get a 32 point increase, since that is the maximum. Your new rating is 1532

Bayesian ELO tries to estimate your rating while also assigning it two variance factors. This means the amount of rating points you can get or lose after a game is technically unlimited: your rating may go up or down indefinitely, but the variance factor tries to correct for that. Example:

Assume no base rating, as you haven't completed any games. BayesELO will give you one after your first game. You beat the best player on the ladder (and again, for argument's sake, he's rated 3000). You get a rating assigned of ~3200, depending on the rest of the ladder.

Continuing that argument, it means that regular ELO will severely underrate new good players, while BayesELO will severely overrate new good players.

But, at the bigger scale, BayesELO seems to do a rather good job. It's not necessarily better, but here's a rough example:

You win 10 games against a player who was initially rated 1500.

Regular ELO:

You gain 115 points. Your new rating is 1615.

Your opponent loses 115 points. His new rating is 1385

This 230 points of difference means the system believes you to have a 79% win-chance against this opponent.

Bayesian ELO:

You gain 180 points. Your new rating is 1680

Your opponent loses 180 points. His new rating is 1320

This 360 point difference means the system believes you to have an 88.8% win-chance against this opponent.

The biggest flaw of using Bayesian ELO on the ladder is not that it's a bad rating system. It's the way that it's implemented on Warlight. When someone gets ranked after 20 games, you have no idea whether the system has a decent amount of certainty in that player's rating. This is why people who do 'runs' get rewarded. A low amount of games consisting of mostly wins means they get a rating that is most likely not trusted. An easy way to fix this is to have a player ranked when their variance is below a certain value (and the BayesELO program gives you this information for free), instead of after a certain amount of games. That was probably the one of the better things with the TrueSkill algorithm on the RT-ladder.

TL;DR:

Regular ELO gets you to a 'truer' rating in a slower way, Bayesian ELO gets you to a rating that might or might not be very volatile in a faster way.

Edited 10/15/2019 17:10:14