Warzone

<< Back to Warzone Classic Forum

Posts 1 - 20 of 82 1 2 3 4 5 Next >>

What makes a template strategic?: 2015-06-16 09:55:31

l4v.r0v

Level 59
Report

Here's a few interesting questions:

How do you prove whether 0% luck makes for more strategic games than 16% luck?

How will we know when we've finally found a good non-Europe based template for 3v3? And is EU 4x5 0% WR just as good as EU 4x4 0% SR?

Is Rise of Rome too big to be a good 1v1 map? Is it a good 2v2 map then?

Are Poon Squad's settings really that bad?

How do I explain to someone that Guiroma 1v1 is actually not a "really weird and bad template" but in fact a solid and well-tested strategic 1v1 template?

If you've ever thought about any of these questions, then my ramblings here might be of interest to you.

I basically ran into this central question- how do I find out what's a good strategic template and what isn't?- when expanding the templates used in CORP Strategic League (our internal ladder system, which now has 56 templates in rotation- so it's a little unwieldy). We all have some qualitative notions of what makes a template good strategically- reasonably low luck, balanced map, just the right size, no weird card situations, etc.- but I've always wondered what quantitative methods could be used to overcome the biases in our qualitative judgements. After all, if Ares 3v3 really turns out to be just as good as the gold standard of EU 3v3, how do we resolve the debate? We obviously have some entrenched opinions. Similarly, we've got a tendency toward conformity when it comes to template design- most people don't really stray too far from Strategic 1v1-like settings.

So how do we do this with just numbers?

Well, first let's consider the extreme examples of "bad" templates:

- Template A (Lottery): A significantly better player and I play each other in a 1v1. We have an equal chance of winning. (In a good strategic template, the better player should be much more likely to win)

- Template B: A really good team plays a really bad team in a 2v2. The worse team has a significantly higher chance of winning. (Again, the better team should have a higher chance).

- Template C: One slightly better player plays a slightly worse player in a 1v1. The slightly better player is almost certain to win- the other player, even though he's just a tad worse, is probably not going to win more than 2% of the time. (The chance of winning should be commensurate to the difference in skill between the two players- while Templates A and B didn't accurately reflect the difference by increasing the worse player's odds of winning, this template fails to accurately reflect the difference in skill level by significantly decreasing the worse player's odds of winning).

So, looking at this, we kind of get an idea of what a good strategic template looks like:

A strategic template is a template that accurately reflects the difference in skill level between the two players.

Where can we go with this core assumption? Well, the main idea here is that there needs to be some way to go between a relative measurement of skill level and the probability that a player wins.

... And that's where Elo ratings come in.

So, Elo, if you're not familiar with it, is just a system where each player has a rating based on their history- with wins against tougher players counting for more points (i.e., a player who's played one game and beat master of desaster will have a higher Elo rating than a player who's played one game and beat someone ranked outside the top 100 of the ladder). Your Elo gain/loss from a game is a function of the difference in ratings- which is supposed to predict the % of times that you'll win. Elo assumed a normal distribution when coming up with system, which (at least in chess) is inaccurate- however, his system still allows us to get some basic quantitative data, which should at least be good enough to compare templates.

A difference in Elo ratings can be converted to an overdog win % using the formula:

Probability that player with higher rating wins = 1 - 1 / (1 + 10 ** (EloDifference/400))

Conversely, you can estimate the rating difference between two players using the formula:

EloDifference = -400 * Log((1/Win Percentage - 1), 10)

Source: http://www.3dkingdoms.com/chess/elo.htm

As you can see by playing around on that site, this would mean that a player who wins 70% of games against their opponent should have an Elo rating that's 147 points higher. Conversely, a player whose rating is 147 points higher than that of their opponent should win about 70% of games.

Where did I go with this, then?

Well, so since you can convert Elo differences to win probabilities, and since you have actual win probabilities, here's the following data I played around with:

- the % of time that the "overdog" (better-rated player) wins a game

- the average rating difference between the "overdog" and "underdog" on a template

I used the rating difference to come up with an expected % of time that the overdog should have won, and compared it to actual results. This is, of course, just one of many analyses that can be performed- I liked it because it's simple.

So, this relies on the following major assumptions:

Elo ratings accurately reflect the relative strength of players in terms of how likely they are to win a head-to-head matchup.

A good Warlight template should have win probabilities that are very close to those predicted by Elo

Also, there's some risk in using the Elo ratings- if you're getting them based on games only played on the template that's being tested, they're going to be a little bit "off" since they would be tainted by the inaccuracies in the template itself- i.e., a template that makes upsets more likely to happen is probably also going to cause you to underestimate how good your overdogs are and overestimate how good your underdogs are. Conversely, if you get data from games played on multiple templates, then you're making the huge assumption that someone can be "good" across a wide range of templates and that the Elo rating you're using accurately reflects their skill across the entire range- which is a risky, albeit useful, assumption. On top of that, games played on templates being tested are still going to be similarly "tainted." However, once you buy into these assumptions, you can start getting cool-ish data:

You can simply subtract the actual overdog win rate from the expected overdog win rate to figure out a template's "bias"- a rating of how likely it makes upsets to happen.

First, I ran this on the CORP Strategic League templates (you can find the data in the "Templates" spreadsheet at http://www.tinyurl.com/csldata). However, CSL only has 88 finished games- and the average player has only played 2.3 games so far. And, well, with a small enough dataset, you can disprove gravity or evolution. So no go.

So I decided to just test this out on the 1v1 and 2v2 ladders (all completed games as of 12:55 AM EDT on 6/16/2015). Given the focus on proving templates, I was also going to check out the Real-Time and Seasonal ladders, but will deal with that later as I'm not sure about the usefulness and reliability of that data (given the higher boot/surrender rates on those- you can also see a lot more upsets if you just look at that data).

Also, this is all based on the assumption that I can use the Bayeselo ratings in a way that's more or less similar to how I would use regular Elo ratings. I don't know enough about the theory behind Coulon's Bayeselo system to be certain of this, but eh this was interesting so I did it anyway.

Here's what I got from the ladders (I got win/loss and rating data from all games, ignoring games where one or both of the players' ratings were expired and set to 0):

1v1 Ladder - Strategic ME 1v1 template

total games: 41732 
overdog wins: 27091 
total overdog score: 70866841 
total underdog score: 62354453
average overdog score: 1698.14
average underdog score: 1494.16
average difference: 203.98
overdog expected win rate: .76
overdog actual win rate: .65
bias direction: underdog
bias strength: .11

2v2 Ladder - Final Earth 2v2 template

total games: 2388 
overdog wins: 1788 
total overdog score: 3902977
total underdog score: 3353609
average overdog score: 1634.41
average underdog score: 1404.36
average difference: 230.05
overdog expected win rate: .79
overdog actual win rate: .75
bias direction: underdog
bias strength: .04

So, as you can see from this, upsets are much more likely to happen on the 1v1 ladder than on the 2v2 ladder. I speculate that this might occur due to some players not playing a whole lot of games and being rated lower than they actually are, but given the size of the dataset maybe the 1v1 template actually just makes upsets more likely. Keep in mind that this data is better understood in relative terms- the 2v2 template might not be biased in favor of the underdog- it could just be a flaw in my assumptions or in the dataset, but it's probably less likely to yield upsets (in Elo-based terms) than the 1v1 ladder template, which I'd say is useful data.

Finally, here's an idea for how you can use this to test a new template:

1. Host a Round Robin with that template. Don't invite players that are going to get booted and ruin some of your data.

2. Great. Now you have 190 games' worth of data. That's 19 games/player- more than enough for reliable Elo ratings.

3. Use Elostat or Bayeselo to give Elo ratings to each player.

4. Analyze the game data the way I did- average rating difference, overdog win %, expected win %, and the difference. I'd love to see more data on this if you'd like to share.

5. Now you have a simplified quantitative reflection of how strategic the template is in the bias strength datapoint.

Also, if someone wants to run this analysis on the Real-Time Ladder (template by template) for me, it'd be much appreciated.

Edited 6/17/2015 04:04:17

What makes a template strategic?: 2015-06-16 10:15:27
ps Level 61 Report	it's not rocket science, less luck requires more strategy to win.

What makes a template strategic?: 2015-06-16 10:17:58

l4v.r0v

Level 59
Report

Probably, but the luck modifier isn't the only form of luck in the game. There's lots of luck involved elsewhere- map setup, for example.

And perhaps 16% luck better measures strategic ability than 0% luck because it's tougher to reason with. There's some questions I just prefer using experimental data for.

Seasonal Ladder Data (skipped Season X as it doesn't really lend itself to this type of analysis- which is only good for XvX setups; also remember that "total games" is just the number of games actually analyzed- some may have been skipped if the players' ratings were detected to be expired/useless by my algorithm):

1v1 Ladder

total games: 41732 
overdog wins: 27091 
total overdog score: 70866841 
total underdog score: 62354453
average overdog score: 1698.14
average underdog score: 1494.16
average difference: 203.98
overdog expected win rate: .76
overdog actual win rate: .65
bias direction: underdog
bias strength: .11

2v2 Ladder

total games: 2388 
overdog wins: 1788 
total overdog score: 3902977
total underdog score: 3353609
average overdog score: 1634.41
average underdog score: 1404.36
average difference: 230.05
overdog expected win rate: .79
overdog actual win rate: .75
bias direction: underdog
bias strength: .04

Season I Ladder

total games: 1010
overdog wins: 832
total overdog score: 2251143
total underdog score: 1964035
average overdog score: 2228.85
average underdog score: 1944.59
average difference: 284.26
overdog expected win rate: .84
overdog actual win rate: .82
bias direction: underdog
bias strength: .02

Season II Ladder

total games: 1208
overdog wins: 949
total overdog score: 2910646
total underdog score: 2653096
average overdog score: 2409.48
average underdog score: 2196.27
average difference: 213.21
overdog expected win rate: .77
overdog actual win rate: .79
bias direction: overdog
bias strength: .02

Season III Ladder

total games: 1111
overdog wins: 889
total overdog score: 2700249
total underdog score: 2349449
average overdog score: 2430.47
average underdog score: 2114.72
average difference: 315.75
overdog expected win rate: .86
overdog actual win rate: .80
bias direction: underdog
bias strength: .06

Season IV Ladder

total games: 874
overdog wins: 673
total overdog score: 2080745
total underdog score: 1826165
average overdog score: 2380.72
average underdog score: 2089.43
average difference: 291.29
overdog expected win rate: .84
overdog actual win rate: .77
bias direction: underdog
bias strength: .07

Season V Ladder

total games: 984
overdog wins: 752
total overdog score: 2371634
total underdog score: 2076194
average overdog score: 2410.20
average underdog score: 2109.95
average difference: 300.25
overdog expected win rate: .85
overdog actual win rate: .76
bias direction: underdog
bias strength: .09

Season VI Ladder

total games: 1183
overdog wins: 861
total overdog score: 2822119
total underdog score: 2522758
average overdog score: 2385.56
average underdog score: 2132.51
average difference: 253.05
overdog expected win rate: .81
overdog actual win rate: .73
bias direction: underdog
bias strength: .08

Season VII Ladder

total games: 1152
overdog wins: 886
total overdog score: 2761343
total underdog score: 2429385
average overdog score: 2397.00
average underdog score: 2108.84
average difference: 288.16
overdog expected win rate: .84
overdog actual win rate: .77
bias direction: underdog
bias strength: .07

Season VIII Ladder

total games: 1334
overdog wins: 1031
total overdog score: 3208341
total underdog score: 2844664
average overdog score: 2405.05
average underdog score: 2132.43
average difference: 272.62
overdog expected win rate: .83
overdog actual win rate: .77
bias direction: underdog
bias strength: .06

Season IX Ladder

total games: 1338
overdog wins: 1013
total overdog score: 3869846
total underdog score: 3431056
average overdog score: 2892.26
average underdog score: 2564.32
average difference: 327.94
overdog expected win rate: .87
overdog actual win rate: .76
bias direction: underdog
bias strength: .11

Season XI Ladder

total games: 1927
overdog wins: 1498
total overdog score: 5591611
total underdog score: 4909740
average overdog score: 2901.72
average underdog score: 2547.87
average difference: 353.85
overdog expected win rate: .88
overdog actual win rate: .78
bias direction: underdog
bias strength: .10

Season XII Ladder

total games: 1958
overdog wins: 1474
total overdog score: 5624570
total underdog score: 4961283
average overdog score: 2872.61
average underdog score: 2533.85
average difference: 338.76
overdog expected win rate: .88
overdog actual win rate: .75
bias direction: underdog
bias strength: .13

Season XIII Ladder

total games: 2156
overdog wins: 1636
total overdog score: 6218291
total underdog score: 5559368
average overdog score: 2884.18
average underdog score: 2578.56
average difference: 305.62
overdog expected win rate: .85
overdog actual win rate: .76
bias direction: underdog
bias strength: .09

Season XIV Ladder

total games: 2542
overdog wins: 1965
total overdog score: 7363815
total underdog score: 6522201
average overdog score: 2896.86
average underdog score: 2565.78
average difference: 331.08
overdog expected win rate: .87
overdog actual win rate: .77
bias direction: underdog
bias strength: .10

Season XV Ladder

total games: 1989
overdog wins: 1558
total overdog score: 5737591
total underdog score: 4937719
average overdog score: 2884.71
average underdog score: 2482.51
average difference: 402.20
overdog expected win rate: .91
overdog actual win rate: .78
bias direction: underdog
bias strength: .13

Season XVI Ladder

total games: 2213
overdog wins: 1710
total overdog score: 6392656
total underdog score: 5666937
average overdog score: 2888.68
average underdog score: 2560.75
average difference: 327.93
overdog expected win rate: .87
overdog actual win rate: .77
bias direction: underdog
bias strength: .10

Season XVII Ladder

total games: 2634
overdog wins: 2015
total overdog score: 7644882
total underdog score: 6748859
average overdog score: 2902.38
average underdog score: 2562.21
average difference: 340.17
overdog expected win rate: .88
overdog actual win rate: .76
bias direction: underdog
bias strength: .12

Season XVIII Ladder

total games: 2642
overdog wins: 2033
total overdog score: 7649613
total underdog score: 6768832
average overdog score: 2895.39
average underdog score: 2562.01
average difference: 333.38
overdog expected win rate: .87
overdog actual win rate: .77
bias direction: underdog
bias strength: .10

Season XIX Ladder (data from 6:20 AM EDT 6/16/2015)

total games: 2423
overdog wins: 1830
total overdog score: 6983105
total underdog score: 6159369
average overdog score: 2882.00
average underdog score: 2542.04
average difference: 339.96
overdog expected win rate: .88
overdog actual win rate: .76
bias direction: underdog
bias strength: .12

I'm very tempted to say this data confirms my suspicions about the Seasonal Ladder's data being unreliable due to a high ratio of underrated players (due to boots/surrenders/loss of interest). It is interesting to see the growing level of underdog bias as the seasons advanced, however- possibly due to a greater number of players leading to a greater ratio of underrated players whose wins were considered upsets even though they really weren't- or just boots fudging the data.

Season I is interesting since it's the 1v1 ladder template and yet the net upset rate (underdog bias) is so different, probably due to the way each ladder operates causing different levels of statistical fudging (from underrated/overrated players).

Edited 6/16/2015 13:07:14

What makes a template strategic?: 2015-06-16 12:45:40

smileyleg

Level 61
Report

I wouldn't say luck and strategy are absolutely inversely related.

Like many, I think 0% WR requires more strategy than 0% SR because of the calculated risk taking. The problem is sometimes the luck can really be the primary difference in the game. When one player completes a bonus where 2 or 3 of his 3v2 attacks succeeded and the other fails because his only 3v2 attack failed that can be huge.

With 16% where you have the really rare fails like 7v4 some of the results just feel to arbitrary.

What makes a template strategic?: 2015-06-17 02:09:11

Benjamin628

Level 60
Report

A strategic template is a template that accurately reflects the difference in skill level between the two players.

I disagree. For example, I have played Forbidden Knowledge about 10 ten times on 1v1 Ladder Settings. He won 9 of them. To win a game you need to only be a little better than your opponent. I would not say I am a little better than Forbidden Knowledge 1/10 times. If you only saw the game where I beat him, with that logic you would conclude I am a better player than him (which is obviously not the case).

As smileyleg said, luck and strategy are not antonyms. Skill is knowing a 3v2 is not guaranteed, so you use it in the right place.

Edited 6/17/2015 02:16:04

What makes a template strategic?: 2015-06-17 02:48:11

l4v.r0v

Level 59
Report

I disagree. For example, I have played Forbidden Knowledge about 10 ten times on 1v1 Ladder Settings. He won 9 of them. To win a game you need to only be a little better than your opponent. I would not say I am a little better than Forbidden Knowledge 1/10 times. If you only saw the game where I beat him, with that logic you would conclude I am a better player than him (which is obviously not the case).

Notice that I phrased all of this in terms of probability, not results. A better player should be more likely to beat a worse player in each game (at least at the very beginning, before specific actions have been taken) but obviously won't win each time. That's why I analyzed tens of thousands of games on the 1v1 ladder and thousands of games on the other ladders. That's also why I recommend large sample sizes for test purposes, because larger sample sizes' outcomes should theoretically be reflecting actual win/loss probabilities. I.e., if you go into each game against FK with a 10% chance of winning, over 1000 games you should win ~100 and he should win about ~900- allowing me to go back from those results (in a large sample set) to calculate your win/loss probabilities in each game based on experimental data.

So by analyzing overdog/underdog win/loss rates across thousands of games, I'm able to estimate the overdog/underdog win/loss probabilities on the template itself and then compare it to Elo's theoretical predictions for how often an overdog should be winning in perfectly strategic conditions. It's by comparing those numbers- not just a single game in which you beat FK- that I performed the analysis.

Edited 6/17/2015 02:49:44

What makes a template strategic?: 2015-06-17 02:56:40
Benjamin628 Level 60 Report	I guess you are right lmao :P And, well, with a small enough dataset, you can disprove gravity or evolution. So no go. How so? Also send me a mail, I'm interested in CSL.

What makes a template strategic?: 2015-06-17 02:58:07

JSA

Level 60
Report

I find this thread very interesting. I think the thing to note with the seasons is that as the seasons have gone on, a higher percentage of players have quit in the middle of the season.
In the early seasons, it was more about having a new template on a ladder than about winning. As time has gone on, you will see that most high level players quit early in the season after they lose a couple games and have no chance of winning the season. Because of this, I would expect later seasons to be more inconsistent in terms of stats.

However, this has happened in all seasons to some degree and will give the advantage to the underdog. There are also cases in the 1v1 and 2v2 ladders where the higher rated player loses because of boot, therefore giving the underdog a greater chance. If there is any easy way to find out the number of boot losses in both the 1v1 and 2v2 ladder, you could analyze the games with that difference in mind. However, I assume there is no easy way to do this, so we must assume the underdog will have a slight advantage (a bias strength of somewhere between 0 and .05 to the underdog). If I had to guess an exact number, I'd estimate that boots give the underdog a .015 bias strength.

Some may be surprised that the 1v1 ladder is not considered as "strategic" as the 2v2 ladder. However, this does not surprise me since I believe team games to have a bigger gap in skill than in 1v1 games. Therefore, there should be more upsets in 1v1 games than team games.

I am interested to try this analysis on some tournaments and see if it is a viable way to decide the strategic value of a template. Even if it is not an accurate way to rate strategic templates, I like the idea of using some kind of formula to determine the strategic value of templates.

What makes a template strategic?: 2015-06-17 03:10:43

l4v.r0v

Level 59
Report

How so?

Well, both of them are predictive theories so they can be tested against real-world numeric outcomes- for example, you can analyze gravity in terms of the speed with which and object accelerates as its moving closer to another object (and see if that corresponds to the values predicted by Newton's equations, for example). Similarly, you can analyze evolutionary outcomes in terms of the Hardy-Weinberg laws.

For the first one, you could use a very small dataset to increase the probability that your numbers aren't going to be very close to what's predicted by Newton's formulas.

For the second one, you could use a very small dataset to increase the probability that your population won't look like it's evolving despite exiting Hardy-Weinberg equilibrium.

But of course both of those examples were exaggerations.

I find this thread very interesting. I think the thing to note with the seasons is that as the seasons have gone on, a higher percentage of players have quit in the middle of the season.

That probably explains it. It's also why I was apprehensive about looking at the Real-time and Seasonal ladders. For me, if this analysis works, it's best used in a Round Robin where no one gets booted or surrenders while they're clearly winning due to vacation/etc.-related reasons.

If there is any easy way to find out the number of boot losses in both the 1v1 and 2v2 ladder, you could analyze the games with that difference in mind. However, I assume there is no easy way to do this, so we must assume the underdog will have a slight advantage (a bias strength of somewhere between 0 and .05 to the underdog). If I had to guess an exact number, I'd estimate that boots give the underdog a .015 bias strength.

So you can get specific game data and weed out losses-by-boot (and actually analyze turn-by-turn moves) but for that you need API access (i.e., a Warlight membership). I just did all of this by making HTTP get requests on the ladder results pages and then analyzing the results I got in the form of HTML- pretty janky, but it worked. I've been trying to figure out how to get more specific game data (the template, how the game ended) using a method that doesn't require me to use the API, but I'm not quite there yet. Maybe I'll figure it out soon.

However, this does not surprise me since I believe team games to have a bigger gap in skill than in 1v1 games. Therefore, there should be more upsets in 1v1 games than team games.

I agree with that reason. Moreover, I think that more players per team would correspond to fewer upsets- since now you can't just get a few lucky moves that allow you to overcome a significantly better player and win the game. It's tougher to beat two players on luck than it is to beat one.

I am interested to try this analysis on some tournaments and see if it is a viable way to decide the strategic value of a template. Even if it is not an accurate way to rate strategic templates, I like the idea of using some kind of formula to determine the strategic value of templates.

If you could link me to some completed round robins where players didn't get booted (especially multiple ones with similar player pools), I'll happily analyze the templates using this method. I've been dying to get my hands on round robins like that, especially groups of round robins with similar player pools, and I've got a hunch that elite players like you probably have a few of those lying around. :)

At the very least, I find it fun to figure out whether a template affects the probability of an upset in a significant/meaningful way.

What makes a template strategic?: 2015-06-17 03:14:09
JSA Level 60 Report	I'll mail you; I should have plenty of round robin tournaments to analyze.

What makes a template strategic?: 2015-06-17 03:26:10
Nex Level 60 Report	JSA working with knyte is scary....

What makes a template strategic?: 2015-06-17 03:34:18
l4v.r0v Level 59 Report	JSA working with knyte is scary.... Yeah... I'm kind of weirded out that this thread only got attention from (people I would consider) good players. I was expecting someone on my blacklist to pop in and tell me to do something better with my time. >_<

What makes a template strategic?: 2015-06-17 03:55:08
Thomas 633 Level 56 Report	Nope actually quite impressed with your dedication. EDIT: Just checked, yes I am on your BL. Edited 6/17/2015 03:55:32

What makes a template strategic?: 2015-06-17 05:39:23

Deadman

Level 64
Report

After a lot of procrastination, I have finally read your thread ;)
I'll be paying more close attention to your posts henceforth :)

I would advise you to break up your posts into small pieces so that it is easier to read, but maybe I knit-pick :P

@JSA
"""Some may be surprised that the 1v1 ladder is not considered as "strategic" as the 2v2 ladder. However, this does not surprise me since I believe team games to have a bigger gap in skill than in 1v1 games. Therefore, there should be more upsets in 1v1 games than team games."""

While I agree with team games requiring more skill than a 1v1, the current 2v2 ladder is heavily influenced by luck due to lack of intel(only 2 starts!) and 16% WR. So I would definitely consider the current 1v1 ladder to be more "strategic" than the current 2v2 ladder.

@ps
"""it's not rocket science, less luck requires more strategy to win."""

I do not agree with that statement entirely. Many players would consider 0%WR to be more strategic than 0%SR(even though it is less deterministic). It is the classic Risk vs Reward problem and I would argue a stronger player would strategize better than a weaker player. However, it is important to analyze this over a significant number of games and not in isolation(where there may be an upset).
With that being said obviously 0%WR is more strategic than 75%WR.

What makes a template strategic?: 2015-06-17 05:51:42

Deadman

Level 64
Report

@knyte

"""
I used the rating difference to come up with an expected % of time that the overdog should have won, and compared it to actual results.

So, this relies on the following major assumptions:
-Elo ratings accurately reflect the relative strength of players in terms of how likely they are to win a head-to-head matchup.
-A good Warlight template should have win probabilities that are very close to those predicted by Elo

You can simply subtract the actual overdog win rate from the expected overdog win rate to figure out a template's "bias"- a rating of how likely it makes upsets to happen.
"""

You are performing this analysis over a large sample, but that sample may still not be reflective of the truth. Would it be a fair statement to say that, if you had an infinite number of games on a template, the expected win rate and the observed win rate would converge to be the same value?

That is, the difference that you see, may not actually be bias, but just an inaccuracy due to lack of samples?

For ex- Say I tossed an unbiased coin 10,000 times(large sample). I get 4950 heads and 5050 tails. Does this mean that the probability of heads is 49.5%? Or does this mean that I just haven't observed enough samples. If I had made 100,000 observations it would be closer to 50-50 split.

What makes a template strategic?: 2015-06-17 06:44:21

TeddyFSB

Level 60
Report

High strateginess of a game means that luck has relatively smaller effect on the outcome. This will be simply reflected in the higher width of distribution of Elo scores in a population. In chess best player is 2800, worst player is 200, while in lottery everyone will oscillate around 1500.

So just look at the width of final rating distribution for each ladder and that should give you what you are looking for.

What makes a template strategic?: 2015-06-17 07:05:43

l4v.r0v

Level 59
Report

@MOTD: That would account for some of the variation but probably not most or all of it. It also wouldn't account for the consistency between the Seasonal Ladders. I forgot my Stats class material but generally with a sample size of 1000, you can expect your data to very closely reflect expected outcomes if it behaves the way it should, theoretically.

The thing is that we don't know what to ground our expectations on- we don't have a system that lets us figured out expected long-term win/loss results on the template itself, only Elo's general model for perfectly strategic scenarios. So, in the end, I end up trying to derive some approximation of the long-term win/loss results on the template itself and compare it to Elo's expected outcomes for a perfectly strategic scenario. There will be some inaccuracy, of course, as I'm comparing experimental data to theoretical data instead of theoretical to theoretical, but it shouldn't make up for the entirety of the bias rating unless every template is perfectly strategic and all these samples are almost consistently inaccurate in the same direction (very improbable).

@TeddyFSB: That's also true. I'm wondering, though, if it only works for lottery-type situations- what about situations where a slightly better player (say, the best and second-best players in 1v1 games) wins all of the time? I think there's a limitation in scenarios where the overdog's win chances are dramatically higher than they should be (so, the other direction). But are those scenarios even unstrategic to begin with- if anything, they'd weigh skill very heavily? I guess I'm really just measuring the probability of upsets (relative to Elo-based predictions) here.

The reason I kind of skipped over that analysis is because I was initially doing this just for CORP Strategic League as a way to weed out bad templates there. Since there's only one 1v1 ladder- not multiple ones on different templates, I wouldn't have been able to perform that analysis there although I have been tracking Elo distributions to make sure the overall ladder isn't luck-based.

In any case, I'll use that analysis to go over tournaments as well. I honestly can't believe I missed that connection.

What makes a template strategic?: 2015-06-17 07:16:14

Corvus5
Level 58
Report

@ master of the dead

For ex- Say I tossed an unbiased coin 10,000 times(large sample). I get 4950 heads and 5050 tails. Does this mean that the probability of heads is 49.5%? Or does this mean that I just haven't observed enough samples. If I had made 100,000 observations it would be closer to 50-50 split.

there is ways to measure that error e.g. "Binomial test"
lets make a simple calculation based on your numbers
the expected value for tails in your example is 5000 and the mean average error is 50
That means inside the intervall [5050,4950] -times Tails we have an accumulated prabability of 68.3%
and in intervall [5100,4900] -times Tails we have an accumulated prabability of 95.4%
and in intervall [5150,4850] -times Tails we have an accumulated prabability of 99.7%
so if you get outsides these intervalls (especially the last one) the probabillty that your coin was biased gets bigger all the time since its very improbable that you got soch a big deviation from the expected value

Edited 6/17/2015 07:29:59

What makes a template strategic?: 2015-06-17 07:32:55
l4v.r0v Level 59 Report	^ Thanks for that. I was thinking along those lines to come up with some sort of quantitative analysis but I have no idea what the standard deviation here would be so I can't do any sort of t-test-type stuff here.

What makes a template strategic?: 2015-06-17 07:39:01

Corvus5
Level 58
Report

@ Knyte
2 important Things
1) elo is only a good sculpting method for Player Strengths if ppl don't evolve over time if they do that you have to find a time based exclusion rhythm for old records stopping to matter
2) Beware the Statistic biting you in the back!!! you can't use the data to derive your elo rate than use that elo rate to determine the bias for that same data. The only Bias you get is the accumulated rounding errors in your calculations since elo Ratings are made to get these probability's out.
Sooo the only option is to have a fixed set of ppl with relatively fixed strengths play play 2 tournies the first one to determine their elo rating. The second one to get the bias. Problem here is its nearly impossible for ppl not to learn from a previous experience against a player.
so however much you turn it around its not working.....
Elo (sadly!) is a good method for rating AI's and Mathematicians..... not much more

Posts 1 - 20 of 82 1 2 3 4 5 Next >>

Post a reply to this thread