Opportunity costs in the game of best choice

The game of best choice, also known as the secretary problem, is a model for sequential decision making with many variations in the literature. Notably, the classical setup assumes that the sequence of candidate rankings is uniformly distributed over time and that there is no expense associated with the candidate interviews. Here, we weight each ranking permutation according to the position of the best candidate in order to model costs incurred from conducting interviews with candidates that are ultimately not hired. We compare our weighted model with the classical (uniform) model via a limiting process. It turns out that imposing even infinitesimal costs on the interviews results in a probability of success that is about 28%, as opposed to 1/e (about 37%) in the classical case.


INTRODUCTION
The game of best choice, or secretary problem, is a model for sequential decision making. In the simplest variant, an interviewer evaluates a pool of N candidates one by one. After each interview, the interviewer ranks the current candidate against all of the candidates interviewed so far, and decides whether to accept the current current candidate (ending the game) or to reject the current candidate (in which case, they cannot be recalled later). The goal of the game is to hire the best candidate out of N . It turns out that the optimal strategy for large N is to reject an initial set of N/e candidates and hire the next candidate who is better than all of them (or the last candidate if no subsequent candidate is better). The probability of hiring the best candidate out of N with this strategy also approaches 1/e. See [GM66] for an introduction to these results. Many other variations and some history have been given in [Fer89] and [Fre83].
We model interview orderings as permutations. The permutation π of N is expressed in one-line notation as [π 1 π 2 · · · π N ] where the π i consist of the elements 1, 2, . . . , N (so each element appears exactly once). In the best choice game, π i is the rank of the ith candidate interviewed in reality, where rank N is best and 1 is worst. What the player sees at each step, however, are relative rankings. For example, corresponding to the interview order π = [2516374], the player sees the sequence of permutations 1, 12, 231, 2314, 24153, 241536, 2516374 and must use only this information to determine when to accept a candidate, thereby ending the game.
Let S N be the set of all permutations of size N . Given some statistic c : S N → N and a positive real number θ, we define a discrete probability distribution on S N via π∈S N θ c(π) . Given a sequence of i distinct integers, we define its flattening to be the unique permutation of {1, 2, . . . , i} having the same relative order as the sequence. Given a permutation π, define the ith prefix flattening, denoted π| [i] , to be the permutation obtained by flattening the sequence π 1 , π 2 , . . . , π i . In the weighted game of best choice, introduced in [Jon19], some π ∈ S N is chosen randomly, with probability f (π), and each prefix flattening π| [1] , π| [2] , . . . is presented sequentially to the player. If the player stops at value Date: March 13, 2019. N , they win; otherwise, they lose. We are interested in calculating the win probability, under optimal play, for finite N as well as in the limit as N → ∞.
In this note, we follow a suggestion by the first author to let c(π) be the position of the largest element in π, indexed starting from 0; that is, c(π) = π −1 (N ) − 1. Equivalently, this is the number of "wasted" interviews required before we can hire the best candidate. Setting θ < 1 has the effect of imposing a multiplicative cost of θ on each wasted interview. For example, the best candidate being hired immediately will contribute 1 = θ 0 (before normalization) to the win probability, whereas each failed interview reduces the contribution of an eventually successful hire by a factor of θ. This weighted model is relevant when the interviews themselves are costly, or if time spent interviewing detracts from the time spent working productively such as when the position being filled is only for a limited term or requires a substantial training investment. Also, observe that when θ = 1, we recover the complete uniform distribution on S N , corresponding to the classical model.
We obtain some interesting behavior vis-à-vis the classical model. The optimal strategy is still positional, for which we reject about 0.435/(1 − θ) initial candidates and select the next best candidate. As N → ∞ and θ → 1, however, this strategy succeeds about 28% of the time even though we have a 1/e ≈ 37% success rate at θ = 1. That is, the asymptotically optimal strategy does not vary continuously with the parameter θ which seems to limit the durability of any "policy advice" derived from the classical model (such as e.g. [SV99]). We found a similar discontinuity in the optimal strategy for the Mallows model in [Jon19], although the success probability there still approached 1/e. In the present model, both the strategy and probability of success are discontinuous. Evidently, there is a "price" of about 8.6% in the asymptotic success rate for imposing any wasted interview penalty, no matter how small.
Although there is an established "full-information" version of the game in which the player observes values from a given distribution, it seems that only a few papers have considered nonuniform rank distributions for the secretary problem. Pfeifer [Pfe89] considers the case where interview ranks are independent but have cumulative distribution functions containing parameters determined by the interview positions. The paper [RF88] considers an explicit continuous probability distribution that allows for dependencies between nearby arrival ranks via a single parameter. Inspired by approximation theory, the paper [KKN15] studies some general properties of non-uniform rank distributions in the secretary problem. Our work also fits into a recent stream of asymptotic results for random permutations by researchers in algebraic combinatorics such as [MP14, CDE18, ABNP16].

THE MODEL
The left-to-right maxima in a permutation π consist of elements π j that are larger in value than every element π i to the left (i.e. for i < j). In the game of best choice, it is never optimal to select a candidate that is not a left-to-right maximum. A positional strategy for the game of best choice is one in which the interviewer transitions from rejection to hiring based only on the position of the interview (as opposed to adjusting the transition based on the prefix flattenings that are encountered). More precisely, the interviewer may play the r-positional strategy on a permutation π by rejecting candidates π 1 , π 2 , . . . , π r and then accepting the next left-to-right maximum thereafter. We say that a particular interview rank order is r-winnable if transitioning from rejection to hiring after the rth interview captures the best candidate. For example, 574239618 is r-winnable for r = 2, 3, 4, and 5. It is straightforward to verify that a permutation π is r-winnable precisely when position r lies between the last two left-to-right maxima in π.
It follows from the results in [Jon19, Section 3] that the optimal strategy in our game of best choice is positional 1 , and we let Theorem 2.1. We have the recurrence with initial conditions W 1 (0) = 1 and W 1 (r) = 0 for all r ≥ 1.
Proof. There are two cases for the r-winnable permutations π of N . If N does not lie in the last position, then we may view the initial segment of π uniquely as an r-winnable permutation of N − 1 by flattening. Since there are N − 1 possible values for the last position, this case contributes (N − 1)W N −1 (r) to W N (r). If N lies in the last position, then π will be winnable if and only if N − 1 lies in one of the first r positions of π. For each of these choices, we may permute the remaining entries in (N − 2)! ways, so these contribute r(N − 2)!θ N −1 all together.
Corollary 2.2. We have Proof. This follows from Theorem 2.1 by induction.
Theorem 2.3. Fix some positive θ = 1. The probability of winning the game of best choice using the strategy that rejects r initial candidates is if r > 0 and is 1−θ 1−θ N if r = 0. Proof. By definition, the probability of winning is .
The result then follows from the previous corollary.
We obtain a curve for each nonnegative value of r (interpreting P 0 as 1 − θ), the first several of which we have plotted in Figure 1. For each value of θ, one of the curves is maximal, yielding the optimal strategy and probability of success. For example, Proof. To see this, the derivative of P r with respect to θ is whereas the successive differences P r−1 − P r are Hence, P r−1 − P r = 1 − θ r dP r dθ so the successive differences and derivatives have the same zeros.
Recall the exponential integral which we view as a function of a positive real variable x (see e.g. [OLBC10]). This is a standard special function implemented in many mathematical software systems. For our main result, we consider the maximum value attained by the related function F (x) = xE 1 (x) on (0, ∞); see Figure 2 for a plot. Although there is no elementary form for this maximum, it occurs where E 1 (x) = e −x so can be estimated numerically to arbitrary precision. Let α and β be defined by F (α) = 0 and F (α) = β. Then, α ≈ 0.43481821500399293 and β ≈ 0.28149362995691674.
We are now in a position to give our main result.
Theorem 3.2. As θ approaches 1 from the left, the optimal strategy in our asymptotic weighted game of best choice approaches a positional strategy that rejects α 1−θ initial candidates and selects the next candidate better than all of them. This strategy has a success probability of β.
Proof. We would like to optimize P r (θ) = r(1 − θ) ∞ i=r θ i i for large r and θ chosen appropriately close to 1. We estimate the series by viewing it as a left or right sum for the corresponding integrals: Hence, we may approximate P r (θ) by P r (θ) = r(1 − θ) ∞ t=r θ t t dt with error less than since the integrand is decreasing, r − 1 > r/2, and θ > 1/2.
Next, we change variables from r to c = (1 − θ)r, and from t to u = (1 − θ)t in the integral. We obtain du = (1 − θ) dt so and our error estimate for |P − P | becomes 4θ c/(1−θ) (1 − θ). Now, we are in a position to take the limit as θ → 1, using lim θ→1 θ 1/(1−θ) = 1/e. This forces P → P by our error estimate, and Optimizing this function for c ∈ (0, ∞) then determines the asymptotically optimal positional strategy (where we reject r = c 1−θ initial candidates) and probability of success. We can also solve the model when θ > 1. One interpretation here is that there is some "trend" in the candidate pool (e.g. due to changes in general economic conditions such as unemployment or interest rates) that is amplifying the probability of seeing the best candidate later. Once again, we find that including even an infinitesimal trend completely changes the optimal asymptotic strategy.
Theorem 3.3. If θ > 1, the probability of success for the strategy that initially rejects r = N λ candidates approaches 1/λ, as N → ∞. Hence, the asymptotic model does not depend on θ.
Proof. Recall Theorem 2.3; we claim To see why, consider the "almost telescoping" sum If we divide by −θ N and take the limit N → ∞ term by term with θ > 1, we find that only the leading (i.e. middle) term survives. Hence, our limit is lim N →∞ Thus, for large N we find that it is optimal to choose the last candidate (and win almost all of the time!), obtaining another discontinuity with the θ < 1 and classical models.

ADDENDUM
Although we were unaware of it until after our work was accepted for publication, the paper [Ras75] solves a very similar problem to the one we are considering. More specifically, we compute in Theorem 2.3 the probability of winning the game of best choice under a non-uniform distribution whereas Rasmussen-Pliska compute in their Equation (2.6) the expected value of a random variable representing the non-uniform payoff for the game played on a uniform distribution. For any particular N and θ, these problems are dual to each other in the sense that their corresponding formulas are off by the multiplicative constant θ+θ 2 +θ 3 +...+θ N N . Since our weights form a probability distribution, we believe our model facilitates a clearer comparison with the classical secretary problem.
When 0 < θ < 1, Rasmussen-Pliska obtain an asymptotic estimate for the optimal strategy that agrees with ours, using different methods. They also note that, for fixed θ < 1, their expected payoff tends to 0 as N tends to infinity (because their denominator does not scale with θ). By contrast, our model has nonzero probabilities, as N tends to infinity, given by the value of Pr(θ) where r is optimal for the fixed θ.