On semi-restricted Rock, Paper, Scissors

Spiro, Surya and Zeng (Electron. J. Combin. 2023; arXiv:2207.11272) recently studied a semi-restricted variant of the well-known game Rock, Paper, Scissors; in this variant the game is played for $3n$ rounds, but one of the two players is restricted and has to use each of the three moves exactly $n$ times. They find the optimal strategy, and they show that it results in an expected score for the unrestricted player $\Theta(\sqrt{n})$; they conjecture, based on numerical evidence, that the expectation is $\approx 1.46\sqrt{n}$. We analyse the result of the strategy further and show that the average is $\sim c \sqrt{n}$ with $c=3\sqrt{3}/2\sqrt{\pi}=1.466$, verifying the conjecture. We also find the asymptotic distribution of the score, and compute its variance.


Introduction
A semi-restricted variant of the well-known game Rock, Paper, Scissors (RPS) was recently studied by Spiro, Surya and Zeng [6].In the standard version of RPS, two players simultaneously select one of the three choices rock, paper, scissors, where paper beats rock, scissors beats paper, and rock beats scissors; if both select the same, the result is a draw.The game is symmetric, so there is obviously no advantage to any of the players.It is easy to see that the optimal strategy for both players is to choose randomly, with equal probability for each choice (see further Section 2.2).
In the semi-restricted variant in [6], two players R (restricted) and N (normal) agree to play 3n rounds of RPS for some integer n, but R is restricted to choose rock, paper, and scissors exactly n times each, while N plays without restriction.Clearly, the restriction is a disadvantage for R. (In particular, N will always win the last round, since R then has only one choice, and N knows which one.)How large is this disadvantage?More precisely, let S n be the final score of N, defined as the number of rounds won by N minus the number lost.We assume (as [6]) that the objective of both players is the expectation E S n , which N wants as high as possible, while R wants the opposite.Semi-restricted RPS is a two-player zero-sum game, and thus by the theory of von Neumann [5], each player has an optimal randomized strategy, see further e.g.[4,Chapter 2].We use S op n to denote the final score when both players use their optimal strategies.(This is a random variable, since the strategies are randomized.) The main result of [6] is that the unique optimal strategy for R is to play greedily, i.e., as if each round were the last; see further Section 2.2.(This is far from obvious, and rather surprising.)It is also shown in [6] that with optimal strategies, the expected gain E S op n " Θp ?nq, and it is asked [6,Question 21]   for some constant c ą 0 as n Ñ 8; [6] says further that numerical calculations for n ď 100 suggest that this might hold with c « 1.46.
The main purpose of the present note is to verify this conjecture, and to identify the constant.
The optimal strategy for R is thus the greedy strategy.Given that R uses this strategy, there are many strategies for N that give the optimal expectation E S op n .One of them is the greedy strategy for N, but as pointed out to me by Sam Spiro [personal communication], the greedy strategy is not the optimal strategy for N; see Section 2.3.We let S gr n denote S n when both players play with their greedy strategies.This is also a random variable, and as just said, we have

E S gr
n " E S op n . (1. 2) The random variable S gr n can be analysed asymptotically using standard tools from probability theory.This is done in Sections 3 and 4 and yields the asymptotics of E S gr n ; Theorem 1.1 then follows by (1.2).Moreover, our analysis also yields the asymptotic distribution of S gr n , see Theorem 4.1.In Section 5 we give some partial results on the asymptotic distribution of S n if R uses the optimal (greedy) strategy and N uses a rather arbitrary strategy, including the case S op n when both play optimally.We leave as an open problem whether S op n and S gr n have the same asymptotic distribution.In Section 6, we discuss the probability that the disadvantaged player R nevertheless wins the game; we compute it for the case that both players play greedily, but leave the case of optimal play for the objective of maximizing the probability of winning as an open problem.
Acknowledgement.I am grateful to Sam Spiro for pointing out a serious error in a previous version, and for showing me Example 2.1.I also thank an anonymous referee for helpful comments.
The random variable Sptq is the score of N after round t " 1, . . ., 3n, i.e., the number of rounds won by N so far minus the number of rounds won by R. As in the introduction, S n :" Sp3nq is the score at the end of the game.(Except for S n , we do not show n explicitly in the notation, although Sptq and many variables introduced below depend on n.) If X n is a sequence of random variables, and a n a sequence of (positive) numbers, we write X n " O p pa n q if the family tX n {a n u is bounded in probability (also called tight), i.e., if for every ε ą 0 there exists C such that Pp|X n | ą Ca n q ă ε for all n.Furthermore, we write X n " O L p pa n q (where p ą 0 is a parameter) if the family tX n {a n u is bounded in L p , i.e., sup n E |X n {a n | p ă 8. N p0, σ 2 q denotes the normal distribution with mean 0 and variance σ 2 ě 0.More generally, if Σ is a symmetric positive semidefinite d ˆd matrix, then N p0, Σq is the normal distribution with mean 0 and covariance matrix Σ; this is a distribution of a random vector in R d .

2.2.
The greedy strategy.Recall that in any two-person zero-sum game, each player has an optimal strategy which in general is randomized; the different alternatives are selected with some probabilities chosen such that they maximize the minimum over all strategies of the opponent of the expected gain; see [5] and e.g.[4].
As said above, it was shown by Spiro, Surya and Zeng [6] that in semi-restricted RPS, the best strategy of R is to play greedily, i.e., to analyse each round separately and use the optimal strategy for the expected score in that round.(This is far from obvious, since the best play in one specific round may be punished by lower expected score in later rounds; nevertheless, [6] shows that the expected later gains by any alternative strategy are offset by the immediate expected loss.)This optimal strategy for a single round is easy to find (as was done in [6]): (i) If R still has all three choices available, then the optimal strategy is (obviously, by symmetry), to choose one of them randomly, with probability 1{3 each.And the best strategy for N is the same.(This game was one of the examples in the original paper by von Neumann [5].)The outcome for N is ´1, 0, or `1 with probability 1{3 each.(ii) If R has only two choices available, say 1 (rock ) and 2 (paper ), then the game is described by the matrix in Figure 1.N should never play 1 (which in this case can lose but never win).A simple calculation shows [6] that the best strategy for R is to play 1 with probability 1{3 and 2 with probability 2{3; similarly N plays 2 with probability 2{3 and 3 with probability 1{3.The expected gain for N is 1{3.(iii) If R has only one choice, then R has to play that, and N obviously plays the next choice (mod 3) and is sure to win.Gain for N is 1.
2.3.Strategies for N. Suppose that R plays optimally, i.e., greedily.Then R plays each time with a random move that depends only on the available moves, and thus on the history of the moves made by R.However, these moves are not affected by the moves made by N. Hence, the moves made by R will be the same regardless of the strategy chosen by N. It thus follows from the discussion above of the greedy strategy that the expected gain for N will be the same for any strategy of N that does not do anything stupid (here and in the sequel meaning making a move that cannot win); for example, as long as R is able to make all three moves, the expected gain of each round is 0 for any strategy of N. In particular, the expected gain for N when both players use their optimal strategies is the same as when both play greedily, which shows (1.2).Nevertheless, the greedy strategy is not optimal for N, since it may be worse if R chooses a different strategy as shown by the following simple example.
Example 2.1.(Sam Spiro, personal communication.)Suppose that N plays with the greedy strategy described above.If R chooses to play (deterministically) 1, 2, 3, 1, 2, 3, . . .for all 3n rounds, then for all but the last two rounds, the greedy strategy by N makes him play randomly, with probability 1{3 for each choice, and therefore the expected gain is 0 for each round.Hence the total gain E S n will in this case be only 1{3 `1 " 4{3 (from the last two rounds), while we know from von Neumann's theorem [5] that N has some strategy guaranteeing an expected gain of at least E S op n against every strategy of R. (Note that E S op n ą 4{3 at least for large n by Theorem 1.1.In fact, it is can easily be seen from (3.3) below that the inequality holds for every n ě 2.) △ It seems likely that the optimal strategy of N is very complicated.See further Section 5.1.

Analysis for the greedy strategies
In this section we assume that R uses the greedy strategy, which is known to be optimal.For simplicity, we assume here that also N uses the greedy strategy.In fact, most of the analysis is valid for almost any strategy by N; we discuss the few but important differences in Section 5.
Let N t,i be the number of times that R plays i during rounds 1, . . ., t.The vector N t " pN t,i q 3 i"1 then evolves as a random walk which changes character each time some N t,i hits n and R thus cannot choose i in the future.We let T j , j " 1, 2, 3, be the first time that R has used up j of the three choices; in particular, T 3 :" 3n, when the game ends.
Since R uses the greedy strategy described above, N t evolves as follows, for t " 0, . . ., n, starting at N 0 " p0, 0, 0q: I.A random walk N 0 , . . ., N T 1 with increments that are independent and uniformly chosen from te 1 , e 2 , e 3 u, until II.A random walk N T 1 , . . ., N T 2 with increments chosen independently and randomly from the remaining two choices by the strategy above; for example, if , then the increments are chosen as e 2 and e 3 with probabilities 1{3 and 2{3.This goes on until III.A deterministic walk N T 2 , . . ., N T 3 where all increments are e i for the only i that still has N t,i ă n.The expected gain for N is 0 for each step in phase I, 1{3 for each step in phase II, and 1 for each step in phase III, so the expected score for N is We will analyse this more carefully below and also both bound and asymptotically describe the random fluctuations.We do this by analysing the constrained random walk N t and the stopping times T 1 and T 2 in some detail.A central role in the analysis is played by the (somewhat arbitrary) non-random time 3.1.Phase I: until T 1 .Let pξ t q 8 t"1 be an i.i.d.sequence of random vectors with the distribution Ppξ t " e i q " 1{3 for i " 1, 2, 3. We may assume that N t ´Nt´1 " ξ t for 1 ď t ď T 1 .Let thus N 1 t " N t for t ď T 1 .(We may interpret ξ t and N 1 t as how R would have played if the restriction had not existed.)In particular, for t ď T 1 we have N 1 t,i " N t,i ď n for all i, and for t ě T 1 we have max i N 1 t,i ě max i N T 1 ,i " n; thus T 1 is also the time that max i N 1 t,i hits n.At time T 0 , the central limit theorem shows that This is less that n for each i w.h.p. (with high probability, i.e., with probability 1 ´op1q as n Ñ 8), and thus w.h.p.T 1 ą T 0 .More precisely, the Chernoff inequality (e.g. in the version in [2, Remark 2.5]) yields Hence, this probability decreases faster than any polynomial, which means that we can ignore the event T 1 ď T 0 also when calculating moments below (since the random variables we consider all are deterministically Opnq).Similarly, concentrating on the time after T 0 , define By classical results on moment convergence in the central limit theorem together with Doob's inequality (since N 1 t,i ´N 1 T 0 ,i ´1 3 pt ´T0 q is a martingale), see for example [1, Theorem 7.5.1,Corollary 3.8.2, and Theorem 10.9.4], we have, for any p ą 1 Consequently, for every p ă 8. (The case p ď 1 follows from the case p ą 1 by Lyapounov's inequality [1, Theorem 3.2.5].)We introduce some further notation.Let, for i " 1, 2, 3, (If we ignore the minor technical difference between N t,i and N 1 t,i , these measure thus the deviation from the expectation at time T 0 of the choices made by R.) Note for later use that Furthermore, let (As we will see in detail below, this largest deviation will give us a good estimate of the time T 1 when R runs out of one choice.)Condition on the event T 1 ą T 0 , which has probability 1´op1q.Then, N 1 T 0 " N T 0 .Moreover, we may take t " T 1 in (3.8) and obtain, using (3.11), Hence, recalling the definitions of T 1 and X max , Consequently, and thus, using (3.10), This was derived conditioned on T 1 ą T 0 , but by (3.7) and the comment after it, (3.17) holds also unconditionally.Furthermore, for every i P t1, 2, 3u, by (3.14) and (3.10), and thus by (3.17) Thus, at time T 1 , when R runs out of one of the three choices, she has approximatively X max ´Xi left of each other choice i.
To find the score in Phase I, consider first the score at T 0 , and condition again on T 1 ą T 0 .Then in each round up to T 0 , R plays normally and thus R and N win with probability 1{3 each, and draw otherwise; thus ∆Sptq :" Sptq ´Spt ´1q P t˘1, 0u with probability 1{3 each.Consequently, the central limit theorem shows that, since E ∆Sptq " 0 and Var ∆Sptq " 2{3, and T 0 " 3n, together with all moments.Moreover, since also N is assumed to use the optimal strategy, which for these t means uniformly randomly, the score in each round is independent of the choices made by R, and thus of the vectors N t .Consequently, SpT 0 q is independent of pX 1 , X 2 , X 3 q.We conditioned here on T 1 ą T 0 , but in the unlikely event T 1 ď T 0 , we may modify SpT 0 q (similarly as we defined N 1 above) and define a sum S 1 pT 0 q that is independent of pX 1 , X 2 , X 3 q and satisfies S 1 pT 0 q " SpT 0 q whenever T 1 ą T 0 , and thus, by (3.7), (rather coarsely) SpT 0 q " S 1 pT 0 q `OL p `n1{3 ˘. (3.21) For T 0 ă t ď T 1 , we still have the same distribution of ∆Sptq, and by the same argument as in (3.9), if we condition on T 1 ą T 0 , then SpT 1 q ´SpT 0 q " O L p `n1{3 ˘.

3.2.
Phase II: T 1 to T 2 .Since the entire game is symmetric under cyclic permutations of the three choices rock, paper, scissors, we may for the next phase assume that R first uses up all n rock, i.e., that N T 1 ,1 " 0. Note, however, that the game is not symmetric under odd permutations, so having made this assumption, choices 2 (paper ) and 3 (scissors) play different roles, since 3 beats 2.
By the discussion of the greedy strategy in Section 2.2, for t P rT 1 , T 2 q, R should play randomly and choose 2 or 3 with probabilities 1/3 and 2/3.We argue as in the preceding subsection (and therefore omit some details); we now let pη t q 8 1 be an i.i.d.sequence of random vectors with Ppη t " e i q " p i for i " 1, 2, 3, with pp 1 , p 2 , p 3 q " p0, 1  3 , 2 3 q, and we assume as we may that N t ´Nt´1 " If we again condition on T 1 ą T 0 , we obtain, by conditioning on T 1 and arguing as in (3.9) and using 3n ´T1 ă 3n ´T0 " Opn 2{3 q, By (3.7) again, this holds also unconditionally.We obtain from (3.24) and (3.25), taking t " T 2 , for every i, We have assumed N T 1 ,1 " n, and then T 2 is the first t such that N t,2 " n or N t,3 " n.
In particular, (3.27) implies which can be written min i"2,3 We repeat that this holds assuming that choice 1 is the first to be used up by R.
In this phase, the gain ∆Sptq of N has expectation 1{3 in each round (and its absolute value is bounded by 1, so all moments are bounded); moreover, the gains in different rounds are i.i.d.Hence, similarly to (3.9) again, the central limit theorem with moment convergence together with Doob's inequality yields SpT 2 q ´SpT 1 q " 1 3 pT 2 ´T1 q `OL p `n1{3 ˘.
(3.33) 3.3.Phase III: T 2 to T 3 .This phase is deterministic, and not very fun to play (at least not for R): R has only one choice, and N wins every round.The total gain for N in this phase are thus, using (3.17) and recalling that T 3 " 3n, where furthermore T 2 ´T1 is given by (3.32) when choice 1 (rock ) is the first to be used up by R. We develop (3.35) as follows.
Lemma 3.1.We have Proof.We may again, by symmetry, suppose that R first uses up 1.Typically, this is the case when X max " X 1 , but it is possible that X 1 is not the maximum.(Then N t,1 is not the largest at t " T 0 , but N t,1 overtakes the other two components and hits n first.)In any case, N T 2 ,1 " N T 1 ,1 " n, and thus (3.27) yields, recalling p 1 " 0, 0 " n ´NT 2 ,1 " X max ´X1 `OL p `n1{3 ˘.
Theorem 3.2.As n Ñ 8, we have convergence in distribution, together with all moments, where W, V 1 , V 2 , V 3 are jointly normal with W independent of pV 1 , V 2 , V 3 q and W P N p0, 2q, (3.42) Proof.The random vectors ξ t in (3.5) are i.i.d. with E ξ t " 0 and covariance matrix (regarding ξ t as a column vector) Varpξ t q :" E ξ tr t ξ t ´pE ξ tr t qpE ξ t q " Σ :" Since T 0 " 3n by (3.4), the central limit theorem yields, recalling (3.11), which agrees with (3.43).Similarly, as noted in (3.20), n ´1{2 SpT 0 q d ÝÑ W . Furthermore, by (3.21) we may here replace SpT 0 q be the approximation S 1 pT 0 q which, as noted above, is independent of pX 1 , X 2 , X 3 q.Hence, and it follows (e.g. using uniform integrability) that all moments converge also in (3.46) and (3.41).
In the following section, we give more convenient expressions for the limit S gr .

The distribution of the limit for greedy strategies
We give several alternative descriptions of the asymptotic distribution found in Theorem 3.2; using them we then prove Theorem 1.1.See also Section 6 for another use of these descriptions.
Theorem 4.1.The limit S gr in Theorem 3.2 can be described by any of the following equivalent formulas: (i) We have where W, Z 1 , Z 2 , Z 3 are jointly normal with W independent of pZ 1 , Z 2 , Z 3 q and W P N p0, 2q, (4.2) where where 3 are jointly normal with where W, R, Θ are independent with W P N p0, 2q as in (4.2), R has a Rayleigh distribution with density 1 2 re ´r2 {4 , r ą 0, and Θ has a uniform distribution U p0, π{3q.
We will use the notation Note also that (4.3) implies that Z 1 `Z2 `Z3 has variance 0, and thus the normal variables Z 1 , Z 2 , Z 3 in (4.1) satisfy Z 1 `Z2 `Z3 " 0 almost surely; thus pZ 1 , Z 2 , Z 3 q lives in a 2-dimensional space.
(ii): We may write W " W 1 `Ă W , where W 1 , Ă W P N p0, 1q, and W 1 and Ă W are independent of each other and of pZ 1 , Z 2 , Z 3 q.Define Z 1 i :" p Ă W `Zi q{ ?3, i " 1, 2, 3. Then (4.1) yields (4.4), and it follows from (4.3) that the covariance matrix of pZ 1  1 , Z 1 2 , Z 1 3 q is the identity matrix; thus the jointly normal variables (iv): As said above, Z 1 `Z2 `Z3 " 0 almost surely, so pZ 1 , Z 2 , Z 3 q has really a 2-dimensional normal distribution.In fact, if ζ " pζ 1 , ζ 2 q is a centered normal distribution in R 2 with Var ζ 1 " Var ζ 2 " 2 and Covpζ 1 , ζ 2 q " 0, then we can construct pZ 1 , Z 2 , Z 3 q with the desired distribution (4.3) by where f 1 :" p1, 0q, f 2 :" p´1 2 , ? 3 2 q, f 3 :" p´1 2 , ´?3 2 q.We define R :" |ζ| and Θ :" argpζ 1 `iζ 2 q P r´π, πq; thus ζ " pR cos Θ, R sin Θq, (4.11) and it follows from (4.10) by simple calculations (which are made even simpler by identifying R 2 and C and regarding ζ as a complex random variable) that The normal distribution of ζ is rotationally symmetric, and thus, as is well-known, R and Θ are independent, with Θ uniformly distributed on r´π, πq; furthermore, R has the Rayleigh distribution stated in the theorem.To find the distribution of Z max :" maxtZ 1 , Z 2 , Z 3 u, we may by symmetry condition on Z max " Z 1 , which by (4.12) is equivalent to Θ P r´π{3, π{3s, and since cos Θ is an even function, we may further restrict to Θ P r0, π{3s.where the right-hand side contains the expectation of the maximum of three i.i.d.standard normal variables which is known to be 3{p2 ?πq [3].△ Higher moments of S gr can be computed in the same way.For example, we have

Analysis when N does not play greedily
Assume as above that R uses the optimal strategy, i.e., the greedy strategy.In Section 3 we assumed that N uses the greedy strategy.More generally, suppose now that N uses any strategy that does not do anything stupid (a move that cannot win when R has only one or two choices).(This includes both the unknown optimal strategy for N, and the greedy strategy, but also many others.)Then, as noted in Section 2.3, the expected gain for N is still 1{3 in each round where R has two choices left, and 1 in each round where R has only one choice.Hence, (3.3) still holds.Moreover, the strategy of R is not affected by the moves made by N, and thus the random walk N 0 , . . ., N 3n and the variables T 1 , T 2 , X 1 , X 2 , X 3 , X max (and others) are the same as in Section 3. In particular, (3.45) still holds.
For the score of N, recall first that in Phase I, when R still has three choices, R plays each with the same probability.It follows that regardless of the strategy of N, the outcome ∆Sptq of each round has the same distribution as discussed in Section 3, i.e., 1, 0, or ´1 with probability 1{3 each; moreover, this is independent of the previous history, so the outcomes of different rounds in this phase are independent.Consequently, SpT 1 q has the same distribution as for the greedy strategy, and so has SpT 0 q if we condition on T 1 ą T 0 .It follows that (3.19) still holds, and so do (3.20)-(3.21).However, there is one important difference from the case of the greedy strategy in Section 3: there the score SpT 0 q is independent of pX 1 , X 2 , X 3 q (again conditioned on T 1 ą T 0 ).This is no longer true in general, since the strategy of N may cause dependencies.We give a simple example showing that this actually may happen in Example 5.3.
In Phase II, R has two choices, and uses the greedy strategy described in Section 2.2(ii).We have assumed that the strategy of N is not stupid, and that leaves two choices for N.Both give an expected gain E ∆Sptq " 1{3, but the distributions are different.The precise distribution of SpT 2 q ´SpT 1 q may therefore depend on the strategy of N, but if we define M i :" SpT 1 `iq ´SpT 1 q ´1 3 i, then the sequence pM i^pT 2 ´T1 q q iě1 (where we stop at T 1 `i " T 2 ) is, for any non-stupid strategy of N, a martingale with uniformly bounded increments, and Doob's inequality shows that (3.33) holds.
In Phase III, N has only one choice that is not stupid, so the strategy is the same is in Section 3, and (3.34) still holds.
It follows that (3.35) holds, and thus Lemma 3.1 holds, by the same proof as above.This leads to the following result.
Theorem 5.1.Suppose that R uses the optimal (i.e., greedy) strategy, and that N uses any non-stupid strategy.(For example, his optimal strategy.)If we decompose n ´1{2 S n " n ´1{2 SpT 0 q `n´1{2 pS n ´SpT 0 qq, (5.1) then the two terms individually converge in distribution to the limits W and Z max in (4.1); however, in general the two terms are dependent, so Theorems 3.2 and Theorem 4.1 do not hold.
Note that it does not follow from Theorem 5.1 that n ´1{2 S n converges in distribution.By general principles, the convergence in distribution implies that each of the sequences n ´1{2 SpT 0 q and n ´1{2 pS n ´SpT 0 q is tight, and thus so is their sum n ´1{2 S n ; this implies that there are subsequences that converge in distribution, but it is conceivable that different subsequences have different limits.(This can easily happen if the strategy explicitly depends on, for example, whether n is even or odd, but it is not expected for "natural" strategies.)Remark 5.2.In general, any (subsequential) limit in distribution S can be written as W `Zmax with W and Z max as in Theorem 4.1, but possibly dependent.It follows from Minkowski's inequality and calculations as in (4.17)-(4.18)that, with the notation ra ˘bs :" ra ´b, a `bs, pVar Sq 1{2 P " pVar W q 1{2 ˘pVar Z max q 1{2 ‰ " " ?Since we have moment convergence by the same arguments as before, it follows that, for any non-stupid strategy for N, lim inf n ´1 Var S n and lim sup n ´1 Var S n lie in the interval (5.3).Furthermore, (5.3) shows that Var S ą 0, so the limit distribution is non-degenerate.△ We give next a simple example showing that there are strategies for N for which n ´1{2 S n has a limit in distribution that is different from S gr ; we then discuss briefly the optimal strategy.Example 5.3.Let the strategy of N be to always play rock as long as R has three choices, and then switch to the greedy strategy for the endgame.(This is obviously a risky strategy if R would guess it, but we assume that R is a mathematician and knows that the greedy strategy is proven to be optimal, and therefore sticks to it.)We do not claim that this is a clever strategy, but it is not stupid in the sense above; thus the results above hold for it.Moreover, in Phase I, N wins when R plays scissors, and loses when R plays paper ; hence Sptq " N t,3 ´Nt,2 for all t ď T 1 .Consequently, assuming T 1 ą T 0 , we have SpT 0 q " N T 0 ,3 ´NT 0 ,2 " X 3 ´X2 . ( It follows that (3.41) still holds, with pV 1 , V 2 , V 3 q and pZ 1 , Z 2 , Z 3 q as before and 3) still hold, but W and Z max are no longer independent.To see that the dependence really matters and leads to a different limit distribution S than for the greedy strategy, we compute, using symmetry and the representation in Theorem 4.1(iv), (5.7) (5.8) It follows that if W 1 " N p0, 2q is independent of Z max , then E S 3 " E pW `Zmax q 3 ą E pW 1 `Zmax q 3 " E pS gr q 3 .(5.9) Hence the limit distribution S differs from S gr for the greedy distribution.△ 5.1.On the optimal strategy for N. Consider now the unknown optimal strategy for N. where W and Z max :" maxtZ 1 , Z 2 , Z 3 u each are as in Theorem 4.1, but they now may be dependent.
Note that if this holds, then (5.2)-(5.3)hold for S op .The optimal strategy for N has to punish strategies for R like the one in Example 2.1.Intuitively, it therefore seems likely that if R plays greedily, then the optimal strategy of N will punish R in games where the times T 1 and T 2 in our analysis in Section 3 are unusually large (and conversely reward R when they are small; remember that the expectation is the same as if N plays greedily).It therefore seems likely that if both players play optimally, there is a negative correlation between the two terms in (5.1).However, even if this is correct, it is possible that the dependency vanishes asymptotically so that we have the same limit S gr as in Theorem 4.1.We have no guess, and leave this as a problem.
Problem 5.5.If both players play optimally, does n ´1{2 S n have the same asymptotic distribution S gr as in Theorem 4.1 for greedy play?If not, is there an asymptotic distribution S op (as conjectured above), and what is it?6.The probability of winning for greedy play Finally, we return to the case of both players using their greedy strategies and note that we may also calculate the asymptotic probability that R wins the game, in spite of her restriction, i.e., that the final score S n ă 0. (Recall that S n is the score for N.) Theorem 6.1.If both players use their greedy strategies, then the probability that R wins has as n Ñ 8 the limit . " 0.064677.(6.1) Proof.By Theorem 4.1, we have PpS n ă 0q Ñ PpS gr ă 0q (since S gr has a continuous distribution, e.g. by (4.1)).We compute this probability using Theorem 4.1(iii).By (4.5), we have We may, similarly to (4.10), construct where p ζ is a standard normal distribution in R 3 , and p f 1 , p f 2 , p f 3 are three vectors in R 3 such that (6.4) By (6.3), the condition (6.2) means that p ζ lies in the intersection of three open halfspaces H 1 , H 2 , H 3 , which are bounded by hyperplanes orthogonal to p f 1 , p f 2 and p f 3 .The angle between any two of these vectors is, by (6.4), α :" arccosp´1{4q.Hence, the interior angle between any of the two hyperplanes is β :" π ´α " arccosp1{4q, and thus the intersection of the unit sphere and H 1 X H 2 X H 3 is a spherical triangle ∆ with all three angles β.Consequently, the area |∆| of ∆ is 3β ´π.The distribution of p ζ is rotationally symmetric, and thus we may project p ζ onto the unit sphere, and find, recalling that the area of the sphere is 4π, Theorem 6.1 assumes that the players use their greedy strategies; we know that this is optimal for R, and yields the same expectation for N as his optimal strategy, if their objectives are to maximize the expected gain; if they instead want to maximize the probability of winning (but do not care about how much they win or lose), the optimal strategies are presumably different (see Example 6.3), and most likely much more complex; hence we do not know whether (6.1) holds or not in that case.Problem 6.2.Suppose that both players want to maxime Ppwinq ´Pploseq.What is (asymptotically) the probability that R wins?It is possible that the asymptotic answer is the same as in Theorem 6.1, although the probabilities for finite n are different.(See Example 6.3.)It might seem likely that a strategy that gives one of the players a significantly lower expected score will also give a lower probability that this score is positive.However, Example 6.4 shows that strategies with the same expectation still might give different distributions of the score and therefore different probabilities of winning, so it seems that there is no simple solution to Problem 6.2.Example 6.3.Here is simple example showing that the greedy strategy is not the optimal strategy for R if the objective is to win, as in Problem 6.2.Let n " 2, and suppose that in the first four rounds, R has (by chance) chosen rock, paper, scissors, scissors, and that R won two of these while two were draws.Thus the score (for N) Sp4q " ´2.Hence, N cannot win, but since he will win the last round, the game will be a draw if he wins round 5. Therefore, in round 5, the objective for R is to minimize the probability of losing (but a draw is as good as a win).In this round R plays the game in Figure 1; if she wants to minimize the probability of losing this round the best strategy is to play rock or paper with equal probabilities, and not with the probabilities in Section 2.2 that minimize the expected loss.(The example can be extended to any n ě 2 by assuming that R has played the three choices n ´2 times each in the first 3pn ´2q rounds, and that each of these rounds was a draw; the play then continues as above.)△ Example 6.4.Suppose that R uses the greedy strategy above, but that N uses the strategy in Example 5. Thus S ě 0 with a point mass PpS " 0q " 1{3 (by symmetry).In this case, we cannot immediately find the limit of PpS n ă 0q, but if the strategy is perturbed a little, and N plays normally for the first ε n n rounds with ε n Ñ 0 very slowly, it can be seen that PpS n ă 0q Ñ 1 2 PpS " 0q " 1{6.In this case, the new strategy for N is worse for him; it gives the same expected score but a lower probability that the score is positive (given that R plays greedily).However, it suggests that there also might be other strategies that instead increase the probability that N wins.△

Figure 1 .
Figure 1.Score matrix for N when R is restricted to trock, paperu; rows show the move by R; columns the move by N.
max ă 0q " P `p ζ P H 1 X H 2 X H 3