The Emergence of Cooperation

Motivating example: can cooperation emerge in a short time frame?

The repeated Prisoners Dilemma is a model of direct reciprocity.

\[\begin{split}A = \begin{pmatrix} 3 & 0\\ 5 & 1 \end{pmatrix}\end{split}\]
  1. The first action corresponds to acting in the interest of all individuals (this is often referred to as Cooperate).

  2. The second action corresponds to acting in ones own self interest (this is often referred to as Defect).

In a population of individuals, is it possible for self interest to not become the norm?

To answer this question we can consider the Iterated Prisoners Dilemma as a repeated game where individuals have two consecutive interactions. The incentive to initially acting selflessly is that the other individual will do the same in the second interaction.

Recalling Strategies in a repeated game the strategies in this case must map histories of play to actions. The possible histories of play are:

\[\mathcal{H} = \{(\emptyset, \emptyset), (r_1, c_1), (r_1, c_2), (r_2, c_1), (r_2, c_2)\}\]

Row strategies are thus of the form:

Cooperate unconditionally:

\[\begin{split}\begin{align} (\emptyset, \emptyset) &\to r_1\\ (r_1, c_1) &\to r_1\\ (r_1, c_2) &\to r_1\\ (r_2, c_1) &\to r_1\\ (r_2, c_2) &\to r_1\\ \end{align}\end{split}\]

Defect unconditionally:

\[\begin{split}\begin{align} (\emptyset, \emptyset) &\to r_2\\ (r_1, c_1) &\to r_2\\ (r_1, c_2) &\to r_2\\ (r_2, c_1) &\to r_2\\ (r_2, c_2) &\to r_2\\ \end{align}\end{split}\]

Start by cooperating and then repeat the action of the opponent:

\[\begin{split}\begin{align} (\emptyset, \emptyset) &\to r_1\\ (r_1, c_1) &\to r_1\\ (r_1, c_2) &\to r_2\\ (r_2, c_1) &\to r_1\\ (r_2, c_2) &\to r_2\\ \end{align}\end{split}\]

The strategy space when repeating the game twice corresponds to 32 different strategies. We can see how these 32 strategies interact in an evolutionary setting using replicator dynamics.

(Source code, png, hires.png, pdf)

../_images/cooperation-1.png

The legend shows the index in the strategy space of the strategies that have a final proportion larger than \(10 ^ {-2}\). The average utility plot gives us the answer to our question: the average per turn utility is 1 which implies that the strategies that survive the evolutionary process are the ones that act selfishly.

The immediate conclusion is somewhat disappointing: how can a society emerge in which individuals will do what is best for the collective?

This question can be better answered by considering a much larger strategy space corresponding to more repetitions of the prisoners dilemma.

The General form of the Prisoners Dilemma

The general form is:

\[\begin{split}A = \begin{pmatrix} R & S\\ T & P \end{pmatrix}\qquad B = \begin{pmatrix} R & T\\ S & P \end{pmatrix}\end{split}\]

with the following constraints:

\[T > R > P > S\qquad 2R > T + S\]
  • The first constraint ensures that the second action “Defect” dominates the first action “Cooperate”.

  • The second constraint ensures that a social dilemma arises: the sum of the utilities to both players is best when they both cooperate.

This game is a good model of agent (human, etc) interaction: a player can choose to take a slight loss of utility for the benefit of the other play and themselves.

Question

Under what conditions is the following game a Prisoners Dilemma:

\[\begin{split}A = \begin{pmatrix} 1 & -\mu \\ 1 + \mu & 0 \end{pmatrix}\qquad B = A ^ T\end{split}\]

As a single one shot game there is not much more to say about the Prisoner’s dilemma. It becomes fascinating when studied as a repeated game.

Axelrod’s tournaments

In 1980, Robert Axelrod (a political scientist) invited submissions to a computer tournament version of an iterated prisoners dilemma. This was described in a 1980 paper [Axelrod1980].

  • 14 strategies were submitted.

  • Round robin tournament with 200 stages including a 15th player who played uniformly randomly.

  • Some complicated strategies, including for example a strategy that used a \(\chi^2\) test to try and identify strategies that were acting randomly. You can read more about this tournament here: http://axelrod.readthedocs.io/en/stable/reference/overview_of_strategies.html#axelrod-s-first-tournament

  • The winner (average score) was in fact a straightforward strategy: Tit For Tat. This strategy starts by cooperating and then repeats the opponents previous move.

The 15 by 15 payoff matrix (rounded to 1 digit) that corresponds to this tournament is:

\[\begin{split}\begin{pmatrix} 3. & 3. & 3. & 3. & 3. & 3. & 3. & 3. & 2.6& 3. & 1.4& 1.1& 1.5& 2.2& 2.2\\ 3. & 3. & 3. & 3. & 3. & 3. & 3. & 3. & 3. & 1.2& 1.3& 1.1& 1.3& 2.8& 2.8\\ 3. & 3. & 3. & 3. & 3. & 3. & 3. & 3. & 2.8& 0.1& 2.2& 2.7& 2.7& 1.6& 1.5\\ 3. & 3. & 3. & 3. & 3. & 3. & 3. & 3. & 3.1& 0.5& 2.1& 2.4& 2.4& 2. & 2.1\\ 3. & 3. & 3. & 3. & 3. & 3. & 3. & 3. & 3.3& 1.3& 1.3& 1.3& 1.4& 2.8& 2.6\\ 3. & 3. & 3. & 3. & 3. & 3. & 3. & 3. & 2.6& 2.5& 1.4& 1.2& 1.7& 2.9& 2.9\\ 3. & 3. & 3. & 3. & 3. & 3. & 3. & 3. & 2. & 1.1& 1.2& 1.1& 1.3& 3. & 3. \\ 3. & 3. & 3. & 3. & 3. & 3. & 3. & 3. & 2. & 1. & 1.3& 1.1& 1.2& 3. & 2.9\\ 2.6& 2.8& 3.2& 2.7& 1.8& 2.6& 1.4& 1.4& 1.5& 3.1& 1.5& 1.4& 2.1& 2.7& 2.9\\ 3. & 1. & 4.9& 3.3& 1.4& 1.1& 1. & 1.2& 2.7& 1. & 2.2& 2.6& 1.2& 2.7& 2.6\\ 1.4& 1.3& 3.5& 2.8& 1.5& 1.4& 1.2& 1.3& 1.7& 3.5& 1.3& 1.1& 1.3& 2.4& 2.4\\ 1.2& 1.1& 3.2& 2.8& 1.4& 1.2& 1.1& 1.1& 1.5& 3.2& 1.2& 1.2& 1.3& 2.3& 2.3\\ 1.5& 1.3& 3.2& 2.8& 1.6& 1.7& 1.2& 1.1& 2.4& 1. & 1.3& 1.2& 1.4& 2.3& 2.4\\ 2.2& 0.9& 3.9& 2.8& 1.1& 0.8& 0.6& 0.6& 1.2& 1.2& 1.8& 2.1& 2. & 2.3& 2.3\\ 2.3& 0.9& 3.9& 2.7& 1.2& 0.7& 0.5& 0.7& 1. & 1.2& 1.8& 2.1& 2. & 2.3& 2.3 \end{pmatrix}\end{split}\]

We see that the first 8 strategies all cooperate with each other (getting a utility of 3).

These 15 strategies are a small subset of the strategy space for the iterated prisoners dilemma with \(T=200\) repetitions. As before, we can see how these 15 strategies interact in an evolutionary setting using replicator dynamics.

(Source code, png, hires.png, pdf)

../_images/cooperation-2.png

It is evident here that cooperation emerges from this strategy space.

The fact that Tit For Tat won garnered a lot of research (still ongoing) as it showed a mathematical model of how cooperative behaviour can emerge in complex situations. However, recent research has shown that Tit For Tat is not a universally strong strategy [Knight2018], [Harper2017], [Press2012].

Using Python

There is a Python library (axelrod) with over 200 strategies that can be used to reproduce this work [Knight2016]. You can read the documentation for it here: http://axelrod.readthedocs.io.