After spending 2 hours coding, I came up with a better simulation method (that is still very limited due to, among other things, what dry-clean-only just said above). The results are a little different. Note: because of the way the program is written, I can semi-manually update these simulation every so often if people are interested in that.
Details of the calculation below. (std=standard deviation, PI=prediction interval)
mean (std) 95% PI
1. Sninja 138.65 (13.19) [112-164]
2. TJC 116.04 (11.32) [ 93-138]
3. [WG] 108.52 (11.33) [ 86-130]
4. Stats 104.38 (12.72) [ 79-130]
5. Hydra 85.23 (11.44) [ 63-108]
6. FCC 47.17 (11.09) [ 27-70 ]
1 2 3 4 5 6
Sninja 83.67% 11.27% 3.74% 1.20% 0.12% 0.00%
TJC 10.17% 45.90% 28.47% 13.41% 2.03% 0.02%
WG 3.49% 24.28% 35.79% 30.02% 6.40% 0.02%
Stats 2.61% 17.52% 27.50% 38.90% 13.36% 0.11%
Hydra 0.06% 1.03% 4.49% 16.45% 76.36% 1.61%
FCC 0.00% 0.00% 0.01% 0.02% 1.73% 98.24%
I personally think the simulations do a good job of matching common sense given the results we have so far and looking at the strength of schedule. The only thing I'm a little weary about, is the score of FCC. I think they may not even got to 27 points, the lower bound of the prediction interval. The reason that the simulations are quite bullish about them, is of course the 10 points from the prior. The program still gives FCC somewhat the benefit of the doubt as it does not know that more boots are still coming.
Juicy math details: in this simulation, I calculated for each clan their score on a logit scale and used this to predict the outcome of the remaining games. First, I again counted the points won and lost by each clan, again adding 10 points won and lost for each clan (burn-in prior). More specifically, I assumed 2 points won and lost against each other clan. From this result, I approximated scores on the logit axis that would reflect the results best using the optim function in R. In more detail: I minimized the squared differences between of the observed logodds calculated as log( win%A / win%B ) in A-B matchups (so win%B is 1-win%A) and the expected logodds for the scores of the 6 clans (score(clan A)-score(clanB)) for the 15 pairings.
Then I made an educated guess for the variance of these estimates where I made sure that it was proportional to the product of the number of points won (including the +10 from the start) and the number of points lost (again including this +10). (If I had more time, I could bootstrap this, but alas.) I chose the formula 10/sqrt(points won*points lost) for the standard deviation because it gives a std of 1 in the case of no information, which I found adequately wide for a non-informative prior. (A team with a score of +2 has a 99% probability to win against a team with a score of -2).
From these calculated scores and their standard deviation, for each simulation a score was drawn. The points in the remaining games were then calculated for each clan under the assumption that the true level of the clan was equal to this simulated score. The win% (for team A) was simply the result of the logodds calculation again, resulting in: sqrt(exp(score A-score B))/(1+sqrt(exp(score A-score B))). Because my input from before did not include the difference between 1v1 and teamgames, I was lazy again and simply assigned wins per 3 points remaining (with a final group of 4 or 5 as appropriate). I think this will not meaningfully influence the results though.
Finally, all these scores were summed and voila!
If anyone wants my R code to improve or add more simulations, feel free to send me a PM or ask here. Note that all CL results were manually input though.
Edited 5/21/2018 20:38:46