<< Back to General Forum   Search

Posts 1 - 30 of 82   1  2  3  Next >>   
What makes a template strategic?: 6/16/2015 09:55:31


knyte 
Level 58
Report
Here's a few interesting questions:

How do you prove whether 0% luck makes for more strategic games than 16% luck?

How will we know when we've finally found a good non-Europe based template for 3v3? And is EU 4x5 0% WR just as good as EU 4x4 0% SR?

Is Rise of Rome too big to be a good 1v1 map? Is it a good 2v2 map then?

Are Poon Squad's settings really that bad?

How do I explain to someone that Guiroma 1v1 is actually not a "really weird and bad template" but in fact a solid and well-tested strategic 1v1 template?

If you've ever thought about any of these questions, then my ramblings here might be of interest to you.

I basically ran into this central question- how do I find out what's a good strategic template and what isn't?- when expanding the templates used in CORP Strategic League (our internal ladder system, which now has 56 templates in rotation- so it's a little unwieldy). We all have some qualitative notions of what makes a template good strategically- reasonably low luck, balanced map, just the right size, no weird card situations, etc.- but I've always wondered what quantitative methods could be used to overcome the biases in our qualitative judgements. After all, if Ares 3v3 really turns out to be just as good as the gold standard of EU 3v3, how do we resolve the debate? We obviously have some entrenched opinions. Similarly, we've got a tendency toward conformity when it comes to template design- most people don't really stray too far from Strategic 1v1-like settings.

So how do we do this with just numbers?

Well, first let's consider the extreme examples of "bad" templates:

- Template A (Lottery): A significantly better player and I play each other in a 1v1. We have an equal chance of winning. (In a good strategic template, the better player should be much more likely to win)

- Template B: A really good team plays a really bad team in a 2v2. The worse team has a significantly higher chance of winning. (Again, the better team should have a higher chance).

- Template C: One slightly better player plays a slightly worse player in a 1v1. The slightly better player is almost certain to win- the other player, even though he's just a tad worse, is probably not going to win more than 2% of the time. (The chance of winning should be commensurate to the difference in skill between the two players- while Templates A and B didn't accurately reflect the difference by increasing the worse player's odds of winning, this template fails to accurately reflect the difference in skill level by significantly decreasing the worse player's odds of winning).

So, looking at this, we kind of get an idea of what a good strategic template looks like:

A strategic template is a template that accurately reflects the difference in skill level between the two players.

Where can we go with this core assumption? Well, the main idea here is that there needs to be some way to go between a relative measurement of skill level and the probability that a player wins.

... And that's where Elo ratings come in.

So, Elo, if you're not familiar with it, is just a system where each player has a rating based on their history- with wins against tougher players counting for more points (i.e., a player who's played one game and beat master of desaster will have a higher Elo rating than a player who's played one game and beat someone ranked outside the top 100 of the ladder). Your Elo gain/loss from a game is a function of the difference in ratings- which is supposed to predict the % of times that you'll win. Elo assumed a normal distribution when coming up with system, which (at least in chess) is inaccurate- however, his system still allows us to get some basic quantitative data, which should at least be good enough to compare templates.

A difference in Elo ratings can be converted to an overdog win % using the formula:

Probability that player with higher rating wins = 1 - 1 / (1 + 10 ** (EloDifference/400))


Conversely, you can estimate the rating difference between two players using the formula:

EloDifference = -400 * Log((1/Win Percentage - 1), 10)


Source: http://www.3dkingdoms.com/chess/elo.htm

As you can see by playing around on that site, this would mean that a player who wins 70% of games against their opponent should have an Elo rating that's 147 points higher. Conversely, a player whose rating is 147 points higher than that of their opponent should win about 70% of games.

Where did I go with this, then?

Well, so since you can convert Elo differences to win probabilities, and since you have actual win probabilities, here's the following data I played around with:

- the % of time that the "overdog" (better-rated player) wins a game

- the average rating difference between the "overdog" and "underdog" on a template

I used the rating difference to come up with an expected % of time that the overdog should have won, and compared it to actual results. This is, of course, just one of many analyses that can be performed- I liked it because it's simple.

So, this relies on the following major assumptions:

Elo ratings accurately reflect the relative strength of players in terms of how likely they are to win a head-to-head matchup.

A good Warlight template should have win probabilities that are very close to those predicted by Elo

Also, there's some risk in using the Elo ratings- if you're getting them based on games only played on the template that's being tested, they're going to be a little bit "off" since they would be tainted by the inaccuracies in the template itself- i.e., a template that makes upsets more likely to happen is probably also going to cause you to underestimate how good your overdogs are and overestimate how good your underdogs are. Conversely, if you get data from games played on multiple templates, then you're making the huge assumption that someone can be "good" across a wide range of templates and that the Elo rating you're using accurately reflects their skill across the entire range- which is a risky, albeit useful, assumption. On top of that, games played on templates being tested are still going to be similarly "tainted." However, once you buy into these assumptions, you can start getting cool-ish data:

You can simply subtract the actual overdog win rate from the expected overdog win rate to figure out a template's "bias"- a rating of how likely it makes upsets to happen.

First, I ran this on the CORP Strategic League templates (you can find the data in the "Templates" spreadsheet at http://www.tinyurl.com/csldata). However, CSL only has 88 finished games- and the average player has only played 2.3 games so far. And, well, with a small enough dataset, you can disprove gravity or evolution. So no go.

So I decided to just test this out on the 1v1 and 2v2 ladders (all completed games as of 12:55 AM EDT on 6/16/2015). Given the focus on proving templates, I was also going to check out the Real-Time and Seasonal ladders, but will deal with that later as I'm not sure about the usefulness and reliability of that data (given the higher boot/surrender rates on those- you can also see a lot more upsets if you just look at that data).

Also, this is all based on the assumption that I can use the Bayeselo ratings in a way that's more or less similar to how I would use regular Elo ratings. I don't know enough about the theory behind Coulon's Bayeselo system to be certain of this, but eh this was interesting so I did it anyway.

Here's what I got from the ladders (I got win/loss and rating data from all games, ignoring games where one or both of the players' ratings were expired and set to 0):

1v1 Ladder - Strategic ME 1v1 template
total games: 41732 
overdog wins: 27091 
total overdog score: 70866841 
total underdog score: 62354453
average overdog score: 1698.14
average underdog score: 1494.16
average difference: 203.98
overdog expected win rate: .76
overdog actual win rate: .65
bias direction: underdog
bias strength: .11 

2v2 Ladder - Final Earth 2v2 template
total games: 2388 
overdog wins: 1788 
total overdog score: 3902977
total underdog score: 3353609
average overdog score: 1634.41
average underdog score: 1404.36
average difference: 230.05
overdog expected win rate: .79
overdog actual win rate: .75
bias direction: underdog
bias strength: .04


So, as you can see from this, upsets are much more likely to happen on the 1v1 ladder than on the 2v2 ladder. I speculate that this might occur due to some players not playing a whole lot of games and being rated lower than they actually are, but given the size of the dataset maybe the 1v1 template actually just makes upsets more likely. Keep in mind that this data is better understood in relative terms- the 2v2 template might not be biased in favor of the underdog- it could just be a flaw in my assumptions or in the dataset, but it's probably less likely to yield upsets (in Elo-based terms) than the 1v1 ladder template, which I'd say is useful data.

Finally, here's an idea for how you can use this to test a new template:

1. Host a Round Robin with that template. Don't invite players that are going to get booted and ruin some of your data.

2. Great. Now you have 190 games' worth of data. That's 19 games/player- more than enough for reliable Elo ratings.

3. Use Elostat or Bayeselo to give Elo ratings to each player.

4. Analyze the game data the way I did- average rating difference, overdog win %, expected win %, and the difference. I'd love to see more data on this if you'd like to share.

5. Now you have a simplified quantitative reflection of how strategic the template is in the bias strength datapoint.

Also, if someone wants to run this analysis on the Real-Time Ladder (template by template) for me, it'd be much appreciated.

Edited 6/17/2015 04:04:17
What makes a template strategic?: 6/16/2015 10:15:27


ps 
Level 60
Report
it's not rocket science, less luck requires more strategy to win.
What makes a template strategic?: 6/16/2015 10:17:58


knyte 
Level 58
Report
Probably, but the luck modifier isn't the only form of luck in the game. There's lots of luck involved elsewhere- map setup, for example.

And perhaps 16% luck better measures strategic ability than 0% luck because it's tougher to reason with. There's some questions I just prefer using experimental data for.

Seasonal Ladder Data (skipped Season X as it doesn't really lend itself to this type of analysis- which is only good for XvX setups; also remember that "total games" is just the number of games actually analyzed- some may have been skipped if the players' ratings were detected to be expired/useless by my algorithm):


1v1 Ladder
total games: 41732 
overdog wins: 27091 
total overdog score: 70866841 
total underdog score: 62354453
average overdog score: 1698.14
average underdog score: 1494.16
average difference: 203.98
overdog expected win rate: .76
overdog actual win rate: .65
bias direction: underdog
bias strength: .11 

2v2 Ladder
total games: 2388 
overdog wins: 1788 
total overdog score: 3902977
total underdog score: 3353609
average overdog score: 1634.41
average underdog score: 1404.36
average difference: 230.05
overdog expected win rate: .79
overdog actual win rate: .75
bias direction: underdog
bias strength: .04

Season I Ladder
total games: 1010
overdog wins: 832
total overdog score: 2251143
total underdog score: 1964035
average overdog score: 2228.85
average underdog score: 1944.59
average difference: 284.26
overdog expected win rate: .84
overdog actual win rate: .82
bias direction: underdog
bias strength: .02

Season II Ladder
total games: 1208
overdog wins: 949
total overdog score: 2910646
total underdog score: 2653096
average overdog score: 2409.48
average underdog score: 2196.27
average difference: 213.21
overdog expected win rate: .77
overdog actual win rate: .79
bias direction: overdog
bias strength: .02

Season III Ladder
total games: 1111
overdog wins: 889
total overdog score: 2700249
total underdog score: 2349449
average overdog score: 2430.47
average underdog score: 2114.72
average difference: 315.75
overdog expected win rate: .86
overdog actual win rate: .80
bias direction: underdog
bias strength: .06

Season IV Ladder
total games: 874
overdog wins: 673
total overdog score: 2080745
total underdog score: 1826165
average overdog score: 2380.72
average underdog score: 2089.43
average difference: 291.29
overdog expected win rate: .84
overdog actual win rate: .77
bias direction: underdog
bias strength: .07

Season V Ladder
total games: 984
overdog wins: 752
total overdog score: 2371634
total underdog score: 2076194
average overdog score: 2410.20
average underdog score: 2109.95
average difference: 300.25
overdog expected win rate: .85
overdog actual win rate: .76
bias direction: underdog
bias strength: .09

Season VI Ladder
total games: 1183
overdog wins: 861
total overdog score: 2822119
total underdog score: 2522758
average overdog score: 2385.56
average underdog score: 2132.51
average difference: 253.05
overdog expected win rate: .81
overdog actual win rate: .73
bias direction: underdog
bias strength: .08

Season VII Ladder
total games: 1152
overdog wins: 886
total overdog score: 2761343
total underdog score: 2429385
average overdog score: 2397.00
average underdog score: 2108.84
average difference: 288.16
overdog expected win rate: .84
overdog actual win rate: .77
bias direction: underdog
bias strength: .07

Season VIII Ladder
total games: 1334
overdog wins: 1031
total overdog score: 3208341
total underdog score: 2844664
average overdog score: 2405.05
average underdog score: 2132.43
average difference: 272.62
overdog expected win rate: .83
overdog actual win rate: .77
bias direction: underdog
bias strength: .06

Season IX Ladder
total games: 1338
overdog wins: 1013
total overdog score: 3869846
total underdog score: 3431056
average overdog score: 2892.26
average underdog score: 2564.32
average difference: 327.94
overdog expected win rate: .87
overdog actual win rate: .76
bias direction: underdog
bias strength: .11

Season XI Ladder
total games: 1927
overdog wins: 1498
total overdog score: 5591611
total underdog score: 4909740
average overdog score: 2901.72
average underdog score: 2547.87
average difference: 353.85
overdog expected win rate: .88
overdog actual win rate: .78
bias direction: underdog
bias strength: .10

Season XII Ladder
total games: 1958
overdog wins: 1474
total overdog score: 5624570
total underdog score: 4961283
average overdog score: 2872.61
average underdog score: 2533.85
average difference: 338.76
overdog expected win rate: .88
overdog actual win rate: .75
bias direction: underdog
bias strength: .13

Season XIII Ladder
total games: 2156
overdog wins: 1636
total overdog score: 6218291
total underdog score: 5559368
average overdog score: 2884.18
average underdog score: 2578.56
average difference: 305.62
overdog expected win rate: .85
overdog actual win rate: .76
bias direction: underdog
bias strength: .09

Season XIV Ladder
total games: 2542
overdog wins: 1965
total overdog score: 7363815
total underdog score: 6522201
average overdog score: 2896.86
average underdog score: 2565.78
average difference: 331.08
overdog expected win rate: .87
overdog actual win rate: .77
bias direction: underdog
bias strength: .10

Season XV Ladder
total games: 1989
overdog wins: 1558
total overdog score: 5737591
total underdog score: 4937719
average overdog score: 2884.71
average underdog score: 2482.51
average difference: 402.20
overdog expected win rate: .91
overdog actual win rate: .78
bias direction: underdog
bias strength: .13

Season XVI Ladder
total games: 2213
overdog wins: 1710
total overdog score: 6392656
total underdog score: 5666937
average overdog score: 2888.68
average underdog score: 2560.75
average difference: 327.93
overdog expected win rate: .87
overdog actual win rate: .77
bias direction: underdog
bias strength: .10

Season XVII Ladder
total games: 2634
overdog wins: 2015
total overdog score: 7644882
total underdog score: 6748859
average overdog score: 2902.38
average underdog score: 2562.21
average difference: 340.17
overdog expected win rate: .88
overdog actual win rate: .76
bias direction: underdog
bias strength: .12

Season XVIII Ladder
total games: 2642
overdog wins: 2033
total overdog score: 7649613
total underdog score: 6768832
average overdog score: 2895.39
average underdog score: 2562.01
average difference: 333.38
overdog expected win rate: .87
overdog actual win rate: .77
bias direction: underdog
bias strength: .10

Season XIX Ladder (data from 6:20 AM EDT 6/16/2015)
total games: 2423
overdog wins: 1830
total overdog score: 6983105
total underdog score: 6159369
average overdog score: 2882.00
average underdog score: 2542.04
average difference: 339.96
overdog expected win rate: .88
overdog actual win rate: .76
bias direction: underdog
bias strength: .12

I'm very tempted to say this data confirms my suspicions about the Seasonal Ladder's data being unreliable due to a high ratio of underrated players (due to boots/surrenders/loss of interest). It is interesting to see the growing level of underdog bias as the seasons advanced, however- possibly due to a greater number of players leading to a greater ratio of underrated players whose wins were considered upsets even though they really weren't- or just boots fudging the data.

Season I is interesting since it's the 1v1 ladder template and yet the net upset rate (underdog bias) is so different, probably due to the way each ladder operates causing different levels of statistical fudging (from underrated/overrated players).

Edited 6/16/2015 13:07:14
What makes a template strategic?: 6/16/2015 12:45:40

smileyleg 
Level 61
Report
I wouldn't say luck and strategy are absolutely inversely related.

Like many, I think 0% WR requires more strategy than 0% SR because of the calculated risk taking. The problem is sometimes the luck can really be the primary difference in the game. When one player completes a bonus where 2 or 3 of his 3v2 attacks succeeded and the other fails because his only 3v2 attack failed that can be huge.

With 16% where you have the really rare fails like 7v4 some of the results just feel to arbitrary.
What makes a template strategic?: 6/17/2015 02:09:11


Benjamin628 
Level 59
Report
A strategic template is a template that accurately reflects the difference in skill level between the two players.

I disagree. For example, I have played Forbidden Knowledge about 10 ten times on 1v1 Ladder Settings. He won 9 of them. To win a game you need to only be a little better than your opponent. I would not say I am a little better than Forbidden Knowledge 1/10 times. If you only saw the game where I beat him, with that logic you would conclude I am a better player than him (which is obviously not the case).

As smileyleg said, luck and strategy are not antonyms. Skill is knowing a 3v2 is not guaranteed, so you use it in the right place.

Edited 6/17/2015 02:16:04
What makes a template strategic?: 6/17/2015 02:48:11


knyte 
Level 58
Report
I disagree. For example, I have played Forbidden Knowledge about 10 ten times on 1v1 Ladder Settings. He won 9 of them. To win a game you need to only be a little better than your opponent. I would not say I am a little better than Forbidden Knowledge 1/10 times. If you only saw the game where I beat him, with that logic you would conclude I am a better player than him (which is obviously not the case).


Notice that I phrased all of this in terms of probability, not results. A better player should be more likely to beat a worse player in each game (at least at the very beginning, before specific actions have been taken) but obviously won't win each time. That's why I analyzed tens of thousands of games on the 1v1 ladder and thousands of games on the other ladders. That's also why I recommend large sample sizes for test purposes, because larger sample sizes' outcomes should theoretically be reflecting actual win/loss probabilities. I.e., if you go into each game against FK with a 10% chance of winning, over 1000 games you should win ~100 and he should win about ~900- allowing me to go back from those results (in a large sample set) to calculate your win/loss probabilities in each game based on experimental data.

So by analyzing overdog/underdog win/loss rates across thousands of games, I'm able to estimate the overdog/underdog win/loss probabilities on the template itself and then compare it to Elo's theoretical predictions for how often an overdog should be winning in perfectly strategic conditions. It's by comparing those numbers- not just a single game in which you beat FK- that I performed the analysis.

Edited 6/17/2015 02:49:44
What makes a template strategic?: 6/17/2015 02:56:40


Benjamin628 
Level 59
Report
I guess you are right lmao :P

And, well, with a small enough dataset, you can disprove gravity or evolution. So no go.

How so?

Also send me a mail, I'm interested in CSL.
What makes a template strategic?: 6/17/2015 02:58:07

JSA 
Level 59
Report
I find this thread very interesting. I think the thing to note with the seasons is that as the seasons have gone on, a higher percentage of players have quit in the middle of the season.
In the early seasons, it was more about having a new template on a ladder than about winning. As time has gone on, you will see that most high level players quit early in the season after they lose a couple games and have no chance of winning the season. Because of this, I would expect later seasons to be more inconsistent in terms of stats.

However, this has happened in all seasons to some degree and will give the advantage to the underdog. There are also cases in the 1v1 and 2v2 ladders where the higher rated player loses because of boot, therefore giving the underdog a greater chance. If there is any easy way to find out the number of boot losses in both the 1v1 and 2v2 ladder, you could analyze the games with that difference in mind. However, I assume there is no easy way to do this, so we must assume the underdog will have a slight advantage (a bias strength of somewhere between 0 and .05 to the underdog). If I had to guess an exact number, I'd estimate that boots give the underdog a .015 bias strength.

Some may be surprised that the 1v1 ladder is not considered as "strategic" as the 2v2 ladder. However, this does not surprise me since I believe team games to have a bigger gap in skill than in 1v1 games. Therefore, there should be more upsets in 1v1 games than team games.

I am interested to try this analysis on some tournaments and see if it is a viable way to decide the strategic value of a template. Even if it is not an accurate way to rate strategic templates, I like the idea of using some kind of formula to determine the strategic value of templates.
What makes a template strategic?: 6/17/2015 03:10:43


knyte 
Level 58
Report
How so?


Well, both of them are predictive theories so they can be tested against real-world numeric outcomes- for example, you can analyze gravity in terms of the speed with which and object accelerates as its moving closer to another object (and see if that corresponds to the values predicted by Newton's equations, for example). Similarly, you can analyze evolutionary outcomes in terms of the Hardy-Weinberg laws.

For the first one, you could use a very small dataset to increase the probability that your numbers aren't going to be very close to what's predicted by Newton's formulas.

For the second one, you could use a very small dataset to increase the probability that your population won't look like it's evolving despite exiting Hardy-Weinberg equilibrium.

But of course both of those examples were exaggerations.

I find this thread very interesting. I think the thing to note with the seasons is that as the seasons have gone on, a higher percentage of players have quit in the middle of the season.


That probably explains it. It's also why I was apprehensive about looking at the Real-time and Seasonal ladders. For me, if this analysis works, it's best used in a Round Robin where no one gets booted or surrenders while they're clearly winning due to vacation/etc.-related reasons.

If there is any easy way to find out the number of boot losses in both the 1v1 and 2v2 ladder, you could analyze the games with that difference in mind. However, I assume there is no easy way to do this, so we must assume the underdog will have a slight advantage (a bias strength of somewhere between 0 and .05 to the underdog). If I had to guess an exact number, I'd estimate that boots give the underdog a .015 bias strength.


So you can get specific game data and weed out losses-by-boot (and actually analyze turn-by-turn moves) but for that you need API access (i.e., a Warlight membership). I just did all of this by making HTTP get requests on the ladder results pages and then analyzing the results I got in the form of HTML- pretty janky, but it worked. I've been trying to figure out how to get more specific game data (the template, how the game ended) using a method that doesn't require me to use the API, but I'm not quite there yet. Maybe I'll figure it out soon.

However, this does not surprise me since I believe team games to have a bigger gap in skill than in 1v1 games. Therefore, there should be more upsets in 1v1 games than team games.


I agree with that reason. Moreover, I think that more players per team would correspond to fewer upsets- since now you can't just get a few lucky moves that allow you to overcome a significantly better player and win the game. It's tougher to beat two players on luck than it is to beat one.

I am interested to try this analysis on some tournaments and see if it is a viable way to decide the strategic value of a template. Even if it is not an accurate way to rate strategic templates, I like the idea of using some kind of formula to determine the strategic value of templates.


If you could link me to some completed round robins where players didn't get booted (especially multiple ones with similar player pools), I'll happily analyze the templates using this method. I've been dying to get my hands on round robins like that, especially groups of round robins with similar player pools, and I've got a hunch that elite players like you probably have a few of those lying around. :)

At the very least, I find it fun to figure out whether a template affects the probability of an upset in a significant/meaningful way.
What makes a template strategic?: 6/17/2015 03:14:09

JSA 
Level 59
Report
I'll mail you; I should have plenty of round robin tournaments to analyze.
What makes a template strategic?: 6/17/2015 03:26:10


Nex
Level 60
Report
JSA working with knyte is scary....
What makes a template strategic?: 6/17/2015 03:34:18


knyte 
Level 58
Report
JSA working with knyte is scary....


Yeah... I'm kind of weirded out that this thread only got attention from (people I would consider) good players.

I was expecting someone on my blacklist to pop in and tell me to do something better with my time. >_<
What makes a template strategic?: 6/17/2015 03:55:08


Thomas 633
Level 56
Report
Nope actually quite impressed with your dedication.
EDIT: Just checked, yes I am on your BL.

Edited 6/17/2015 03:55:32
What makes a template strategic?: 6/17/2015 05:39:23


Master of the Dead 
Level 63
Report
After a lot of procrastination, I have finally read your thread ;)
I'll be paying more close attention to your posts henceforth :)

I would advise you to break up your posts into small pieces so that it is easier to read, but maybe I knit-pick :P




@JSA
"""Some may be surprised that the 1v1 ladder is not considered as "strategic" as the 2v2 ladder. However, this does not surprise me since I believe team games to have a bigger gap in skill than in 1v1 games. Therefore, there should be more upsets in 1v1 games than team games."""

While I agree with team games requiring more skill than a 1v1, the current 2v2 ladder is heavily influenced by luck due to lack of intel(only 2 starts!) and 16% WR. So I would definitely consider the current 1v1 ladder to be more "strategic" than the current 2v2 ladder.



@ps
"""it's not rocket science, less luck requires more strategy to win."""

I do not agree with that statement entirely. Many players would consider 0%WR to be more strategic than 0%SR(even though it is less deterministic). It is the classic Risk vs Reward problem and I would argue a stronger player would strategize better than a weaker player. However, it is important to analyze this over a significant number of games and not in isolation(where there may be an upset).
With that being said obviously 0%WR is more strategic than 75%WR.
What makes a template strategic?: 6/17/2015 05:51:42


Master of the Dead 
Level 63
Report
@knyte

"""
I used the rating difference to come up with an expected % of time that the overdog should have won, and compared it to actual results.

So, this relies on the following major assumptions:
-Elo ratings accurately reflect the relative strength of players in terms of how likely they are to win a head-to-head matchup.
-A good Warlight template should have win probabilities that are very close to those predicted by Elo

You can simply subtract the actual overdog win rate from the expected overdog win rate to figure out a template's "bias"- a rating of how likely it makes upsets to happen.
"""



You are performing this analysis over a large sample, but that sample may still not be reflective of the truth. Would it be a fair statement to say that, if you had an infinite number of games on a template, the expected win rate and the observed win rate would converge to be the same value?

That is, the difference that you see, may not actually be bias, but just an inaccuracy due to lack of samples?


For ex- Say I tossed an unbiased coin 10,000 times(large sample). I get 4950 heads and 5050 tails. Does this mean that the probability of heads is 49.5%? Or does this mean that I just haven't observed enough samples. If I had made 100,000 observations it would be closer to 50-50 split.
What makes a template strategic?: 6/17/2015 06:44:21

TeddyFSB 
Level 60
Report
High strateginess of a game means that luck has relatively smaller effect on the outcome. This will be simply reflected in the higher width of distribution of Elo scores in a population. In chess best player is 2800, worst player is 200, while in lottery everyone will oscillate around 1500.

So just look at the width of final rating distribution for each ladder and that should give you what you are looking for.
What makes a template strategic?: 6/17/2015 07:05:43


knyte 
Level 58
Report
@MOTD: That would account for some of the variation but probably not most or all of it. It also wouldn't account for the consistency between the Seasonal Ladders. I forgot my Stats class material but generally with a sample size of 1000, you can expect your data to very closely reflect expected outcomes if it behaves the way it should, theoretically.

The thing is that we don't know what to ground our expectations on- we don't have a system that lets us figured out expected long-term win/loss results on the template itself, only Elo's general model for perfectly strategic scenarios. So, in the end, I end up trying to derive some approximation of the long-term win/loss results on the template itself and compare it to Elo's expected outcomes for a perfectly strategic scenario. There will be some inaccuracy, of course, as I'm comparing experimental data to theoretical data instead of theoretical to theoretical, but it shouldn't make up for the entirety of the bias rating unless every template is perfectly strategic and all these samples are almost consistently inaccurate in the same direction (very improbable).

@TeddyFSB: That's also true. I'm wondering, though, if it only works for lottery-type situations- what about situations where a slightly better player (say, the best and second-best players in 1v1 games) wins all of the time? I think there's a limitation in scenarios where the overdog's win chances are dramatically higher than they should be (so, the other direction). But are those scenarios even unstrategic to begin with- if anything, they'd weigh skill very heavily? I guess I'm really just measuring the probability of upsets (relative to Elo-based predictions) here.

The reason I kind of skipped over that analysis is because I was initially doing this just for CORP Strategic League as a way to weed out bad templates there. Since there's only one 1v1 ladder- not multiple ones on different templates, I wouldn't have been able to perform that analysis there although I have been tracking Elo distributions to make sure the overall ladder isn't luck-based.

In any case, I'll use that analysis to go over tournaments as well. I honestly can't believe I missed that connection.
What makes a template strategic?: 6/17/2015 07:16:14

Corvus5
Level 58
Report
@ master of the dead
For ex- Say I tossed an unbiased coin 10,000 times(large sample). I get 4950 heads and 5050 tails. Does this mean that the probability of heads is 49.5%? Or does this mean that I just haven't observed enough samples. If I had made 100,000 observations it would be closer to 50-50 split.

there is ways to measure that error e.g. "Binomial test"
lets make a simple calculation based on your numbers
the expected value for tails in your example is 5000 and the mean average error is 50
That means inside the intervall [5050,4950] -times Tails we have an accumulated prabability of 68.3%
and in intervall [5100,4900] -times Tails we have an accumulated prabability of 95.4%
and in intervall [5150,4850] -times Tails we have an accumulated prabability of 99.7%
so if you get outsides these intervalls (especially the last one) the probabillty that your coin was biased gets bigger all the time since its very improbable that you got soch a big deviation from the expected value

Edited 6/17/2015 07:29:59
What makes a template strategic?: 6/17/2015 07:32:55


knyte 
Level 58
Report
^ Thanks for that. I was thinking along those lines to come up with some sort of quantitative analysis but I have no idea what the standard deviation here would be so I can't do any sort of t-test-type stuff here.
What makes a template strategic?: 6/17/2015 07:39:01

Corvus5
Level 58
Report
@ Knyte
2 important Things
1) elo is only a good sculpting method for Player Strengths if ppl don't evolve over time if they do that you have to find a time based exclusion rhythm for old records stopping to matter
2) Beware the Statistic biting you in the back!!! you can't use the data to derive your elo rate than use that elo rate to determine the bias for that same data. The only Bias you get is the accumulated rounding errors in your calculations since elo Ratings are made to get these probability's out.
Sooo the only option is to have a fixed set of ppl with relatively fixed strengths play play 2 tournies the first one to determine their elo rating. The second one to get the bias. Problem here is its nearly impossible for ppl not to learn from a previous experience against a player.
so however much you turn it around its not working.....
Elo (sadly!) is a good method for rating AI's and Mathematicians..... not much more
What makes a template strategic?: 6/17/2015 07:41:39

Corvus5
Level 58
Report
standard deviation=sqrt(n*p*(1-p))
What makes a template strategic?: 6/17/2015 07:45:07


knyte 
Level 58
Report
Darn. :( Thanks.

Alright, so here's the question I began with:

- How do I determine the impact a template has on the likelihood of upsets?

I could simply measure the % of games that ended up in an upset, but a template could end up in 10 games each ending up with a 1490-rated player beating a 1510-rated player and it wouldn't be as worrying as another template that has 10 games with the first 5 involving a 1490-rated player beating a 2000-rated player and the last 5 involving the opposite.

How do I measure the rate of upsets in a way that also accounts for the relative strenghts of the players involved- i.e., weights the upsets based on how unlikely they were to happen?

EDIT: So Corvus and I are having a private convo about this- here's an idea:

Using players' games during a time period to come up with their Elo curve (historical) and use that rating to figure out what sort of bias the template had during a certain game.

So the central challenge here seems to be getting reliable Elo ratings for players to test with.

And actually, I think that tournaments could still be used since we'd be able to come up with players' Elo ratings (reliably) at a given point in time. But that still leaves us with the problem of the template affecting those Elo ratings to begin with.

Edited 6/17/2015 08:07:02
What makes a template strategic?: 6/17/2015 09:25:13


Master Ryiro 
Level 62
Report
IMO there's nothing more strategic in 16% SR than there is in 0% SR and vice versa
its just that you have to make few adjustments in your attacks(IF NEEDED) to account for luck

with the new updates warlight has made in the analyze attack column its much easier to calculate now and those who were unaware of how it worked earlier can access it with much more ease :)
What makes a template strategic?: 6/17/2015 09:34:34


Master Ryiro 
Level 62
Report
so basically those who are ignorant have high chances of losing in 16% SR than those(top players) who are not which is an important thing to note because this is somehow misunderstood to be a part of strategy(which i don't like when pro players say it)
What makes a template strategic?: 6/17/2015 09:37:35


knyte 
Level 58
Report
^ I agree. The game of Risk is about calculated risk. A small change in the luck factor only adds an equal challenge for both players. The only issue with luck and strategy is how luck modifier changes enable otherwise improbable things to happen (e.g., a player never succeeding at a 3v2 while their opponent always does).
What makes a template strategic?: 6/17/2015 10:11:15


ℳℛᐤƬrαńɋℰ✕
Level 57
Report
Even hundreds of years later opinions differ still in matter if chess is strategic or psychological game. One should not dvelve into quantitative and qualitative analysis without eliminating first its prejudice.

First of all if you want to measure something (in most cases) make sure you understand what you measure, unless you try to find something new and define it. But in general know what you measure and have a definition beforehand to make it stick, to eliminate options for future interpretation and alteration of results.

Skill, luck, strategy? Are they opposite, co-exist, consist or what?
Good or bad? Unless you do not give any value or clear definition its merely your personal opinion and only dilutes your study/research/method etc.

WR/SR, and luck factor, blind factor?

There is probably no universal definition of strategy, but it does not need to mean that we cant measure it or define it somehow. In general and to simplify I ususally break it up for two: Pure-static strategy and Comparative-Dynamic strategy.
Pure-static Strategy - Chess like gameplay - where all information is known and visible, no random factor unless playing black or white; falls under formal logic, which allows to use in-depth move analysis due to open intel factor- every move has clear and visible effect on further game and can be put into test and under judgement. To sum it down: all possible alternative opponents choises are known to you and their current status of things.
Comparative-dynamic Strategy - More of a general skill how one operates under certain rules. In sense of this partiucal game I would give definition close to that one has to take into considerations all rules, variable factors and changing evironment in closed system. (Closed system: meaning defined map - knowing its not endless; variables: starting positions eiter manual choice or random, fog - which consist blind factor in genral, and rule based variables: luck and WR). To sum it down: Strategy lies in estimating opponents position (calculation risk) under blind factor rules in closed system. [Basic topic-discussion lies what are the factors that give 100% winning advantage to player and if it is true in terms of independent of opponents player choice to avert-prevent it; what measure of rule, luck, random is avert of strategy - if any?]

To give example. Straight round would be definitely Pure Strategy. Weighted random more of second kind as its rule varies and applies differently to every attack and player. Same goes to Fog, automatic starting positions, random wastelands and luck (although here lies the discussion about how big luck and where to draw the line between lottery - hope general logic solves it for now). Of course if we reduce strategy merely on choice, then one of course can claim that buying two lottery tickets will maximise your chance of winning - which in essence is just absurd to define as strategy.

I do not offer full definition or guideline and its merely my opinion although put togethet from various sources and models used in different platforms. Tried to add small WarLight background into it. Basic thing to keep in mind: if you start measuring something just define it, know what you measure and make it logic. Do not leave room for interpretation or mis-results. Mainly: What is strategy to you, or what kind strategy you measure in particular?

Logic is great tool once perfected. In strategy games and analysis one should know how to use, understand and define: system (open-closed), game rules, variables, players-choices, independent-dependent factors. How they supplement each other or actualise in real terms.

Edited 6/17/2015 10:14:26
What makes a template strategic?: 6/17/2015 11:08:07


knyte 
Level 58
Report
Returning back to what Corvus said:

1. I've come to agree that the central issue is the unreliability of the Elo ratings. We can't truly know what an "upset" is (or how big an upset is) unless we have reliable Elo ratings for each player. He does have a great idea for getting good baseline Elo ratings, but we'll need some players who're interested in helping us out there.

2. That said, I think the bias ratings are still a lot more than just rounding error. Even when Elo ratings are tainted by a template, the rate of upsets is going to differ significantly. For example, a 2000-Elo rated player (on a lottery template, someone who just got lucky a bunch of times) is still going to lose 50% of games against a 1500-Elo rated player on the same template, but a 2000-Elo rated player on a purely strategic template is going to win a far greater share of games against a 1500-Elo rated player on that template- so even with tainted ratings, we can still meaningfully measure the difference in the probability and magnitude of upsets in a way that's not just accounted for by the Elo ratings. Bad templates still have a greater (or lower) occurrence of upsets, even when just measured in terms of their tainted Elo ratings- simply because of how these templates work.

In other words: what we're actually quantifying is the accuracy of Elo ratings produced by a template. These Elo ratings are predictive- they make claims about what portion of head-to-head matchups two players are going to win in the long run. If the template is bad, it's going to lead to bad Elo ratings- ones that don't actually reflect how likely a player is to win against another player with a certain rating. And we can test that- by looking at the data I'm analyzing and figuring out the rate and magnitude of upsets.

It's like testing a hypothesis- how accurate are the predictions? The hypothesis in this case, of course, is that a template is purely strategic. And the prediction that hypothesis makes is that the outcomes on that template are going to closely reflect Elo-based predictions.

Good template -> reliable Elo ratings -> Elo ratings that are more consistent with actual overdog win rate -> lower bias rating

Bad template -> unreliable Elo ratings -> Elo ratings that are less consistent with actual overdog win rate -> higher bias rating

As far as that "tainting" issue goes, I think it's actually still sound. Perhaps an analysis of a single-template ladder or tournament would actually be more sound based on this logic than an analysis of a single template on a multi-template ladder or tournament, because the Elo ratings in that case would better reflect the impact of the template.

That said, the long-run analysis issue still persists- when we have a consistent Elo rating for each player rather than an Elo curve (which we actually do have in CSL, but right now that's got its own issues), we won't be able to accurately deal with games where the overdog lost because they were just not as good back then as they are now- for example, my first many games on the ladder would probably be considered upsets now since my rating has gone up significantly from then. But in reality, they weren't upsets- I was just a shit player (still am, but used to be even worse back then). There's two solutions to this- either we analyze fewer games or we assign new Elo ratings using a separate process that reflects a player's development. The second is more reliable.

Finally, I think Corvus's process is going to be the most sound solution- we pick 20 or so players, analyze them to get baseline Elo ratings after some realtime games played in certain conditions, and then we use them to test templates. Those baseline Elo ratings would probably be the best way to measure upsets, but there's still the assumption that players have a single skill level across all templates instead of varying skill levels unique to each template. If you find that assumption too big to make, then my original analysis would be the most valid (since all Elo ratings are based on a single template).
What makes a template strategic?: 6/17/2015 13:11:37


Nex
Level 60
Report
I was expecting someone on my blacklist to pop in and tell me to do something better with my time. >_<


If you're hanging out with the right crowd, you won't get that kind of troll.


JSA's the closest thing Warlight has to a historian. If the two of you team up, then maybe you actually will disprove gravity
What makes a template strategic?: 6/17/2015 14:07:17


Master Ryiro 
Level 62
Report
to answer your other questions knyte :-

is EU 4x5 0% WR just as good as EU 4x4 0% SR?
No!EU 4x5 0% WR simply sucks

Is Rise of Rome too big to be a good 1v1 map?Is it a good 2v2 map then?
it depends from person to person whether they like it or not.i personally like RoR 3v3,4v4 and 5v5
i hate the idea of 1v1 but if its a coin game,any time dude :)

Are Poon Squad's settings really that bad?
consider my 1st answer.now multiply it by 1000

How do I explain to someone that Guiroma 1v1 is actually not a "really weird and bad template" but in fact a solid and well-tested strategic 1v1 template?
don't.if they don't realize it after playing 4/5 games then don't even bother

Edited 6/17/2015 14:08:53
What makes a template strategic?: 6/17/2015 15:03:40


Nogals 
Level 58
Report
can I get a link to the guiroma template?
Posts 1 - 30 of 82   1  2  3  Next >>