<< Back to Off-topic Forum   Search

Posts 1 - 5 of 5   
carykh's Iterated Prisoner's Dilemma Tournament: 5/20/2021 16:24:53


l4v.r0v 
Level 59
Report
TL;DR: Fork https://github.com/carykh/PrisonersDilemmaTournament, add new strategies in code/exampleStrats, run code/prisonersDilemma.py to see in code/results.txt how well you did. Submit your best strategy on https://docs.google.com/forms/d/e/1FAIpQLSevMcli8KSpOBMUTOdRuJwNBzkuGDghAfAagEP1akNhPBgXew/viewform by 05/26 10 PM UTC (that's 6 PM US-East, 3 PM US-West, and midnight 05/27 in Central European Time). carykh will run these strategies in a giant FFA with the sample ones and the top 3 will get $1000/$300/$100.
(see: https://htwins.net/prisoners-dilemma/)
The Prisoner's Dilemma is a thought experiment where two players can either cooperate ("C") or defect ("D") for some amount of points. E.g., if you and your mate are both getting interrogated by the police and either of you could sell the other out for a lighter sentence. The idea is that if both of you defect, you'll both end up worse off, but defecting is the rational move (highest expected value) for each of you. Here's the concrete points for this contest:

Both Cooperate (C/C): +3/+3
You Cooperate, Opponent Defects (C/D): 0/+5
You Defect, Opponent Cooperates (D/C): +5/0
Both Defect (D/D): +1/+1

Note that defecting gets you either +1 or +5 points (based on whether your opponent cooperates) while cooperating gets you either 0 or +3. So far it's uninteresting...

But what if you could play it over and over again as a non-zero-sum game? Then two-players who keep cooperating would rack up +3's while two rational sociopaths who keep defecting would rack up +1's. You can actually play this out and see how it works in Nicky Case's "Evolution of Trust" game: https://ncase.me/trust/

Long story short, the long-running and ridiculously dominant champion of Iterated Prisoner's Dilemma strategies is Tit for Tat (https://heritage.umich.edu/stories/the-prisoners-dilemma/): Cooperate, but if your opponent defects, defect back.

carykh, a Stanford undergrad, is now running a contest where you get to come up with a better Iterated Prisoner's Dilemma strategy than Tit for Tat! And your strategy gets to face off against some classic ones (from leading intellectuals of the past) as well as other players' strategies, and the top 3 win cash prizes.

You should enter!
Coming up with a better strategy than Tit for Tat might sound daunting, but, well it's easy to code and test new strategies (you just have to write a Python function that takes in a numpy "history" array of your and your opponent's past moves, as well as "memory" which you get to set and which can be anything you want it to be- you just pass it to yourself between function calls). And it's not that hard to come up with something that does better than Tit for Tat, although Tit for Tat will probably dominate the meta. You can find all of my progress so far in https://github.com/l4vr0v/PrisonersDilemmaTournament (code/strats/nprtt.py is my best one, while code/strats/npnntt.py is what I submitted to the contest). Here's a couple of insights from me to get started:

1. Being nice is important. Some strategies like joss (tit for tat but randomly defect 10% of the time) and detective try to cleverly initiate defection to take advantage of "always cooperate." This is risky because the game is non-zero-sum! There's a strategy called the grim trigger (aka the grudge strategy) which will always defect once you defect. You will turn the remainder of a game of freebie +3's into a cycle of +1's, and this alone will almost totally offset the gain you make by cheating "always cooperate." Throw in other strategies in the mix like tit for tat, which we'll undoubtedly see many variants of in the final submissions, and you can see that initiating defection is probably a dead end that sets you back.

2. De-escalate conflict. Suppose you're tit for tat up against joss (tit for tat but with a 10% chance of a random defection). Once joss defects, your mutual tit-for-tat behavior will lock you in a loop of D/D's (+1 each), costing you your prior C/C's (+3 each). You might "win" against joss by scoring higher than joss, but that doesn't matter- you want to maximize your own points, not the point difference! So trying to break the D/D loop is worth it because even D/D C/D C/C (+4) is an improvement over D/D D/D D/D (+3) and if you're able to restore a healthy C/C streak then that improvement will be vast!

While in the ordinary Prisoner's Dilemma defection has the highest expected value, in the Iterated Prisoner's Dilemma a cycle of cooperation racks up points that very quickly offset an occasional cheating event. In fact, my best strategies all got cheated pretty badly by joss, detective, and others, but they still scored more points than tit for tat in those matches so they came out ahead. (As for joss, detective, and other abusers getting more points out of those matches than my strategies, I refer you back to #1 - I don't think strategies that initiate defection are the frontrunners so giving them a leg up doesn't matter that much.)

Here's my breakdown of the problem:
You need a Response Rule to deal with defections. You can't initiate defections, but if your opponent does, you need some rule for how to respond. This could be based on their most recent turn (e.g., like tit for tat) and determine only your next turn, or you can do something more complicated that takes their whole history into account and/or plans ahead on your part instead of going turn-by-turn.

You need a Forgiveness Strategy to break defection loops. You need to determine when and how to offer an olive branch and see if your opponent responds, because even that risk of a C/D is outweighed by the potential massive rewards of returning to C/C. Of course, your opponent's behavior might not be deterministic (they could be random, flipping back and forth, or just always defecting) so this has a lot of room for strategizing. Forgiveness strategies is actually the entirety of where my solutions built their advantage over tit for tat.

The meta is very tit-for-tat dominated, so I suspect in the big FFA, most 1v1 matches will just be cycles of C/C that rack up +3 or close to +3 per decision on average. This means that even small improvements in how you handle non-nice opponents, who initiate defections, and rational opponents, who might back down from a defection cycle, can make a huge difference in your performance. Also, there's some really smart people on this site who know math and game theory far better than I do, so perhaps they can offer you insights that far exceed my paltry perspective above.

I hope many of you enter, I hope we're able to come up with something that ends tit-for-tat's streak of dominance, and I hope some of you brilliant people on this site are able to translate the strategic intuition and reasoning you built playing Warzone Idle Battle Royales or Warzone Classic into a strong Iterated Prisoner's Dilemma strategy. Good luck, everyone!

Edited 5/20/2021 16:28:55
carykh's Iterated Prisoner's Dilemma Tournament: 5/20/2021 17:36:11


l4v.r0v 
Level 59
Report
Also, I just want to illustrate that it doesn't take a lot of code to write even a fairly sophisticated Iterated Prisoner's Dilemma strategy, lest that's holding any of you back:

This is Joss, a strategy by a Swiss mathematician. It behaves like tit-for-tat (defect if opponent defects, otherwise cooperate) but also randomly defects another 10% of the time. Look at how simple the code is for that:
import random

# Variant of Tit For Tat that randomly defects to try to take advantage
# of overly forgiving opponents.

def strategy(history, memory):
    choice = 1
    if random.random() < 0.10 or (history.shape[1] >= 1 and history[1,-1] == 0):
    # Choose to defect randomly by 10% chance, OR if and only if the opponent just defected.
        choice = 0
    return choice, None
It takes 5 minutes to write a new strategy. If you don't know how to code, I'd be more than happy to help (without taking any credit). You can even share strategies on this thread or via DMs and I can convert them to code (eventually...) and tell you how well they do in my meta.
carykh's Iterated Prisoner's Dilemma Tournament: 5/20/2021 20:47:51

Orannis
Level 57
Report
can you explain this code for us python noobs
carykh's Iterated Prisoner's Dilemma Tournament: 5/21/2021 01:22:00


l4v.r0v 
Level 59
Report
Sure. Joss is a strategy that's tit for tat (i.e., your move is your opponent's last move) but with a 10% chance added in of defecting regardless of your opponent's move.

"strategy" is the main function implementing the strategy (this is forced by the repository; your solution has to be a "strategy" function). "history" is a numpy 2 dimensional array tracking your moves and your opponent's moves. It's basically an array of 2 arrays where each array is just the history of each player's moves.

So if you always cooperated and your opponent always defected, history looks like: [
[ 1, 1, 1, 1, 1, 1 ],
[ 0, 0, 0, 0, 0, 0 ],
]

"memory" is whatever you want it to be- it's one of your function's two outputs (the second one) and it just gets passed in the next time your function is called. So it's a way for you to track whatever you want between function calls. It's initially None.

You return "choice" (1 if you cooperate, 0 if you defect) and "memory" (which just goes back to you next move and doesn't change any behavior).

So this code sets choice to 0 if your opponent just defected last turn or if your 10% chance happened (random.random() picks a random number). history.shape[1] is just the length of history's 2nd dimension (history.shape[0] is for the 1st dimension), so the length of the inner arrays. That's just how many turns have gone on. history[1, -1] is your opponent's most recent move: the 1 selects the second array in history (your opponent's moves) and -1 in Python selects the last element from the list (their most recent move).
carykh's Iterated Prisoner's Dilemma Tournament: 5/21/2021 02:57:19

Orannis
Level 57
Report
Ok thanks, that all makes a lot of sense.
Posts 1 - 5 of 5