Home> urticator.net Search About This Site > Domains Glue Stories Computers Driving Games Humor Law > Math Numbers Science Continued Fractions > Game Theory (Section) Group Theory Probability Miscellaneous Combinatorial Game Theory Game Theory The Prisoner's Dilemma
|
Evolutionarily Stable StrategiesThe concept of an evolutionarily stable (ES) strategy is a general one, meaning, roughly, a strategy that, when used by an entire population, is resistant to invasion by new (mutant) strategies, i.e., is stable with respect to evolutionary changes.If you're not already familiar with the idea, this is probably so abstract as to be meaningless, so I'll spend the rest of this essay on a series of examples based on strategies for the iterated prisoner's dilemma. As with so many other interesting ideas, the idea of ES strategies came to me via Douglas Hofstadter, specifically, through several related articles in Metamagical Themas, most notably The Prisoner's Dilemma Computer Tournaments and the Evolution of Cooperation. That article, in turn, is based on The Evolution of Cooperation, which I really ought to track down and read, some time. Two other good sources of information are Dawkins' books The Selfish Gene and The Extended Phenotype. The examples will all use the following version of the prisoner's dilemma payoff matrix.
Thus, for example, if one player cooperates while the other defects, the defector gets five points, the other nothing. Although there are lots of possible strategies, there are only three I want to talk about. The first two strategies are simple, always cooperate and always defect; I'll call them AC and AD. The third is called “tit for tat” (TT); it's the strategy in which you cooperate in the first round, then do whatever the other player did in the previous round. This is known to be a reasonably good strategy. If I had to pick a strategy for myself, I think I'd use TT, but with a small probability of cooperating in response to a series of defections. Otherwise, there's no way for two identical players to recover from a random or accidental defection. Now we can look at some examples. First, let's show that AC is not stable against invasion by AD. The way to do this is to imagine a large population of players all using strategy AC (ACers). How would a single ADer perform in this environment? Well, the ADer, encountering only ACers, makes five points per round, while the ACers, also encountering only other ACers, make three. Thus, in this environment, ADers do better, and will thrive. On the other hand, AD is stable against invasion by AC—the ADers all make one point per round, while an invading ACer makes nothing. Since ACers do worse than ADers, they will die out. In fact, suppose we have any mixed population of AC and AD, such that a random player has probability p of being an ACer. If we like, we can describe this population as a vector in strategy space,
p AC + (1−p) AD. In this environment, ADers make p × 5 + (1−p) × 1 = 4p + 1 points per round, while ACers make only p × 3 + (1−p) × 0 = 3p. As before, the ADers always do better, and will come to dominate the ACers; the only stable combination of AC and AD is pure AD. (If one of these combinations were stable, we'd have to call it an ES state rather than a strategy.) It should come as no surprise that ADers always do better. Since none of the above environments contain strategies that change behavior based on past experience, the fact that the game is iterated is lost on everyone involved; as in the original prisoner's dilemma, defection is preferred. So, what about a strategy that does change behavior, namely TT? Well, before we can analyze that, we have to look at something I glossed over, which is the question of how, exactly, the iterated game is played. Does each player pick another player at random? Does every player play against every other player? I think it turns out not to matter; all that matters is that the average number of rounds played by any two specific players is large enough that the game can be considered iterated. For definiteness, though, let's suppose that each player plays exactly N rounds against one other player chosen at random. So, now let's see if AD is stable against TT. As before, the ADers, playing against each other, make one point per round, but what about the TTer, playing against an ADer? Well, in the first round, the TTer will cooperate and get nothing, but in the remaining rounds ve will defect and get one point per round, for a total of 1 × 0 + (N−1) × 1 = N−1 points, or 1 − 1/N points per round. Assuming N is large, this is essentially the same as what the ADers make, so this environment is neutral with respect to invasion by TTers. It might seem that in computing the score of the ADers, one should take into account the possibility that the ADer might happen to play against the single TTer. This could be done, but all it does is add corrections of order 1/M, where M is the size of the population. Also, I think the approach I'm using is justifiable. What I'm really doing is considering a population of pure AD and taking derivatives in the AD and TT directions … or something like that. Anyway, since a population of pure AD is neutral to invasion by TT, we can suppose that by chance a few TTers have entered the system, putting the population in the following state:
p TT + (1−p) AD. How do the ADers and TTers fare now? Before answering this, it will help to do some preparation. First, here's the payoff matrix (in average points per round) for the iterated game, for the three strategies I've chosen to look at.
You can see the original prisoner's dilemma payoff matrix in the lower right. Restricting to strategies TT and AD, and ignoring terms of order 1/N, we obtain the following matrix.
This is a nice surprise! Just as defection dominated cooperation in the original prisoner's dilemma, now “tit for tat” dominates the strategy of always defecting. This is how cooperation can be rationally justified. Returning to the mixed-population example, we see that the ADers makes one point per round regardless, while the TTers make p × 3 + (1−p) × 1 = 1 + 2p. Thus, the environment is no longer neutral with respect to TTers; it now actively favors them over ADers. So, although AD is initially neutral to invasion by TT, it becomes unstable once small amounts of TT are actually present. In other words, even an environment consisting entirely of defectors can be invaded and converted by cooperators (TTers, not ACers). One can think of this as a model for how cooperation can evolve in nature, where defection is the natural initial state of affairs. These facts about cooperation are all well and good, but what about the original point of this essay, ES strategies? Well, if you do the math, it turns out that TT is a genuine ES strategy. The TTers make three points per round, while an invading ADer would make only one. An invading ACer, it's true, would also make three points per round, so TT is neutral to invasion by AC, but it is really neutral, not unstably neutral as in the invasion of AD by TT.
|
See AlsoAlleles and Loci Driving in Boston Evolution Indecision No Pure Strategy Is Stable Notes (Game Theory) Picking Up Trash Prisoner's Dilemma, The Shape of Strategy Space, The Thought on Stability, A @ August (2000) |