Diameter and Stationary Distribution of Random $r$-out Digraphs

Let $D(n,r)$ be a random $r$-out regular directed multigraph on the set of vertices $\{1,\ldots,n\}$. In this work, we establish that for every $r \ge 2$, there exists $\eta_r>0$ such that $\text{diam}(D(n,r))=(1+\eta_r+o(1))\log_r{n}$. Our techniques also allow us to bound some extremal quantities related to the stationary distribution of a simple random walk on $D(n,r)$. In particular, we determine the asymptotic behaviour of $\pi_{\max}$ and $\pi_{\min}$, the maximum and the minimum values of the stationary distribution. We show that with high probability $\pi_{\max} = n^{-1+o(1)}$ and $\pi_{\min}=n^{-(1+\eta_r)+o(1)}$. Our proof shows that the vertices with $\pi(v)$ near to $\pi_{\min}$ lie at the top of"narrow, slippery towers", such vertices are also responsible for increasing the diameter from $(1+o(1))\log_r n$ to $(1+\eta_r+o(1))\log_r{n}$.


Introduction
Call a random directed graph D with vertices V (D) = {v 1 , . . . , v n } a random r-out digraph if each vertex in V (D) has out-degree r, and the nr heads of edges in E(D) are iid and uniformly distributed over V (D). We allow digraphs to have multiple edges and loops. It is useful to have a canonical construction: for each pair (i, j) ∈ [n] × [r], let L i,j be a uniformly random element of [n], and write D(n, d) for the random r-out digraph with vertex set [ is not strongly connected. In order to study cases where D is not necessarily strongly connected, we write D 0 = D 0 (n, r) for the strongly connected component of D(n, r) with the largest number of vertices (if there is more than one such component, D 0 is the one whose smallest labelled vertex is minimal). Let λ r = max{λ : 1 − λ = e −rλ }, and let Observe that λ r → 1 and η r → 0 when r → ∞.
The main contribution of this paper is to determine diam(D(n, r)) in probability 1 .
As a consequence of our analysis we can obtain estimates for the smallest and largest probability in the stationary distribution of a random walk on D(n, r). Before stating our result we give a few classical definitions that can be found in [26,30].
Given S ⊂ V (D), say that D[S] is attractive if for all v ∈ V (D) there is a directed path from v to S. It is easily seen that a digraph can contain at most one attractive strongly connected component D [S]. Grusho [19] proved that D 0 is the unique attractive strongly connected component of D(n, r), with high probability 2 . Recently, Balle [5] showed that D 0 is ergodic whp.
It is well-known that if D has an attractive and ergodic strongly connected component D 0 , then a simple random walk on D has a unique stationary distribution π = π D with support in D 0 . We write π max (D) = max{π D (v) : v ∈ V (D)} and π min (D) = min{π D (v) : Unlike in the undirected case, the stationary distribution of a directed graph is not usually determined by the degree sequence and one can find pathological examples where 1 A sequence of random variables X n converges to X in probability if for every > 0, P(|X n − X| > ) → 0 as n → ∞. If X n /Y n → X in probability then we also write X n = (X + o p (1))Y n . 2 Here and for the remainder of the paper, with high probability, or whp, means with probability tending to 1 as n → ∞. π min (D) is exponentially small in the number of vertices of D. Moreover, having a large π min is crucial to ensure that the mixing time of a random walk in D is small.
In the second part we will obtain estimates for π max (D(n, r)) and π min (D(n, r)).
One of our main motivations for the study of D(n, r) comes from the analysis of random Deterministic Finite Automata (DFA). DFA can be described using r-out regular directed graphs, where each of the r out-going arcs is labeled with distinct symbols from an alphabet of size r. A possible model of random DFA with n states can be obtained by considering a random r-out directed graph D(n, r) and by assigning random labels on the out-going arcs. This particular model has been used in the literature on automata theory to study properties of "generic" DFA and average behavior of DFA algorithms (see Section 2.1 for a more detailed account of related work). In particular, the first analysis of the diameter of D(n, r) appeared in the classic book of Trakhtenbrot and Barzdin [37,Theorem 5.5]. The authors of [37] showed that for every r 2 there exists a constant C r 1 such that with high probability diam(D) C r log r n. This upper bound has been recently used, together with a lower bound on π min (D(n, r)), to stablish the possibility of efficently learning random DFA [1]. Thus, our results also have direct consequences on the complexity of learning random DFA.
Here we present some other graph-theoretic implications of Theorems 1 and 2: • The results obtained can be easily transferred to random simple r-out digraphs. Let D sim (n, r) be chosen uniformly at random from the set of directed simple graphs (no loops or multiple edges) with vertex set [n] such that each vertex has out-degree r.
The conditional distribution of D(n, r), given that it is simple, is precisely that of D sim (n, r). Furthermore, it is not hard to show (see [27]) that P(D(n, r) is simple) = e −Θ(r 2 ) .
In particular, this probability is bounded away from zero for fixed r, so any property that holds whp for D(n, r) also holds whp for D sim (n, r).
• It is not hard to deduce from our arguments that for all u, v ∈ V (D(n, r)), conditional on the event that v ∈ D 0 (n, r), we have dist D(n,r) (u, v) = (1 + o p (1)) log r n. This shows that the typical distance in D(n, r) is (1 + o p (1)) log r n. We leave the details to the interested reader.
• The random r-out, s-in digraph D(n, r, s) is defined similarly to D(n, r), but each vertex chooses s in-neighbours as well as r out-neighbours, all independently and uniformly at random; see [17]. In particular, D(n, r) d = D(n, r, 0) 3 . It may be interesting to consider the diameter and the stationary distribution for D(n, r, s) when s = 0. One case follows from Theorem 1: since the diameter of a digraph is the same as the diameter of the digraph obtained by flipping the direction of all the edges, diam(D(n, 0, r)) d = diam(D(n, r, 0)). In contrast, studying the stationary distribution of D(n, 0, r) seems less interesting: typically there will be many vertices with no out-edges where a simple random walk will eventually become stuck.
Outline. The paper is organized as follows. We start in Section 2 by discussing our motivation for addressing these problems and by putting our results in the context of other models for random (di)graphs. In Section 3 we introduce the notation that will be used throughout the paper and state some basic concentration inequalities and facts about branching processes. In Section 4.1 we finish the proof of the upper bound on the diameter of D(n, r) (Theorem 1) assuming some technical estimates. The breadth-first search procedure that will be used to explore the graph is described in Section 5. In Section 6 we study the behaviour of the in-neighbourhoods of D(n, r) by comparing them with Poisson Galton-Watson trees, while in Section 7 we study its out-neighbourhoods. In Section 8, we prove the technical estimates, completing the proof of the upper bound given in Section 4.1. The proof of the lower bound on the diameter of D(n, r) (Theorem 1) occupies Section 9. We conclude the paper by proving Theorem 2 in Section 10.

Related Work
In this section we survey related work, both arising from literature on random DFA and from work on other models of random graphs. . We think of the pair (V, L) as specifying a directed multigraph D with vertices V and edges {(v i , L(i, j)) : i ∈ [n], j ∈ [r]}; every vertex of D has out-degree r, and the r edges leaving a vertex v are labeled with distinct symbols from Σ. In addition, a DFA is equipped with a distinguished vertex s called the initial state, and with a binary labelling B : V (D) → {0, 1}; the vertices in B −1 ({1}) are the accepting states of the DFA. The DFA is formally given by the tuple Q = (V, Σ, L, s, B).

Random Deterministic Finite Automata
Let Σ denote the set of all finite strings with symbols in Σ. Words w = w 1 w 2 . . . w t ∈ Σ correspond to walks x 0 (w), x 1 (w), . . . , x t (w) on V : x 0 = s and, for 1 i t, x i is reached from x i−1 by following the edge with label w i . We write Q(w) = x t (w) for the final state of the walk. The DFA accepts the word w if B(Q(w)) = 1. The set 3 For any two random variables X and Y , we use the notation X d = Y to denote that the corresponding probability distributions are equal.
the electronic journal of combinatorics 27(3) (2020), #P3.28 L(Q) = {w ∈ Σ : B(Q(w)) = 1} is the language recognized by the DFA. The set of languages recognized by some DFA are precisely the regular languages.
To see the connection with random out-regular graphs, observe that we may build a uniformly random DFA with n labelled states and alphabet of size r as follows. Let D(n, r) be as in the first paragraph of the paper, using the random variables (L i,j : (i, j) ∈ [n]×[r]). Then for (i, j) ∈ [n] × [r], let L(i, j) = L i,j ); equivalently, assign label σ i to edge (i, L i,j ). Choose the starting state s uniformly at random from [n], and choose B uniformly at random from the set of functions f : [n] → {0, 1}.
DFA and regular languages play a crucial role in language theory and there is a vast literature on algorithms over DFA, ranging from minimization and property testing, to synthesis, learning and composition [34]. Next we describe the problem of learning random DFA from uniformly sampled examples, whose complexity is related to the diameter and stationary distribution of D(n, r) [1].
Learning regular languages from different sources of information is a prominent problem in computational learning theory [24], which is most often studied within the context of so-called grammatical inference problems [16]. An important problem in this area concerns the possibility of learning regular languages under the probably approximately correct (PAC) learning model introduced by Valiant [38]. Roughly speaking, this asks for an efficient algorithm such that, when supplied with a large enough sample containing iid strings drawn from some arbitrary probability distribution µ on Σ and labels indicating whether each string belongs to some hidden regular language, the algorithm outputs a representation of a regular language (e.g. a DFA) which is close to the hidden regular language in a sense that depends on the distribution which generated the sample strings. Several results from the 90's indicate that, in its full generality, PAC learning of DFA is hard due to complexity-theoretic as well as cryptographic reasons [23,32] (see also the recent strengthened result [13]). A natural question to ask in such a scenario is whether there exists a reasonable simplification of the problem for which a positive answer is possible. This requires one to come up with scenarios that rule out the worst-case problems arising from specially crafted regular languages and distributions over examples appearing in the proofs of the aforementioned lower bounds.
One possibility is to study the average case. This approach can be formalized by considering regular languages defined by random DFA. In particular, one can ask for an algorithm that with high probability (as the number of the states in the DFA goes to infinity) can learn the regular language recognized by a random DFA. There exists evidence suggesting that such relaxation might not be enough to achieve efficient learning in general: it was recently showed by Angluin et al. that generic instances of DFA (as well as decision trees and DNF formulas) are hard to learn from statistical queries when examples can be sampled from an arbitrary distribution [3]. Nevertheless, prior to Angluin et al.'s result it was showed that generic decision trees and generic DNF formulas can be efficiently learned when samples are drawn according to the uniform distribution [20,35].
In view of the panorama described in the previous paragraphs, a natural question to ask is whether random DFA can be efficiently learned when sample strings are drawn from the uniform distribution. More precisely, one would like to answer the following sorts of questions. Fix a uniformly random DFA Q with states [n] and alphabet [r]. Then fix m ∈ N and let (x i , i 1) be iid words sampled uniformly at random from [r] m .
1. Given the sequences (x i , i 1) and (B(Q(x i )), i 1), is it possible to construct a DFAQ that recognizes the same language as Q with high probability?
2. Given the sequences (x i , i 1), (Q(x i )), i 1) and (B(Q(x i )), i 1), is it possible to construct a DFAQ that recognizes the same language as Q with high probability?
In both cases, if the answer is yes then it is natural to ask for efficient algorithms (average case running time polynomial in n, m, r, and any other parameters involved). The questions can be weakened by only requiring thatQ recognizes the same set of words of length m. A further weakening is to only require that P(Q(y) = Q(y)) > 1 − when y is uniformly distributed over [r] m . The results in [1] establish that in order to answer the second question, it would be sufficient to understand several specific properties of a random walk on a randomly generated DFA. When a string is sampled from the uniform distribution over [r] m and is labeled according to the state that it reaches, the label immediately corresponds to the final state of a simple random walk of length m over the DFA starting from the initial state. Thus, the analysis of the algorithm in [1] relies on bounds on the diameter, stationary distribution, and mixing time on random r-out regular digraphs.
Angluin and Chen base their analysis on previous results about the strongly connected component D 0 of D(n, r). The first of such results is due to Grusho, who showed that the strongly connected component D 0 is attractive and its order satisfies |V (D 0 )| = (1 + o p (1))λ r · n [19] (we remark that its order is also the asymptotic order of the giant component in the Erdős-Rényi random graph G(n, r/n)). The average-case analysis of algorithms for the minimization of Deterministic Finite Automata (DFA) has led to multiple rediscoveries of Grusho's result [7,11,12]. As mentioned above, the diameter of D(n, r) was first studied by Trakhtenbrot and Barzdin in [37], who showed that for every r 2, we have diam(D(n, r)) = O(log r n), whp.
Several other properties of random DFA have been studied, both in learning theory and in other contexts, using the D(n, r) model. For example, first Korshunov's group, and later Nicaud's group, have studied the probability that random DFA exhibit particular structures, mainly motivated by the analysis of sample and reject algorithms for enumeration of subclasses of automata (see [28] and references therein). Motivated by worst-case hardness results for learning a DFA, Angluin and co-authors have used properties of random DFA to study the problem of learning a generic DFA [2,3]. The average-case complexity of DFA minimization algorithms has also received some attention recently [6,15]. Finally, a series of results have led to a solution of the long-standingČerný conjecture about synchronization of finite automata in the case of random DFA [8,29,36].

Diameter and stationary distribution of other random graph models
In this subsection we describe some previous results on the diameter and the stationary distribution of certain random graph models and relate them to Theorem 1 and to The-orem 2. This provides an intuition for the results we have obtained on the diameter of D(n, r). We consider the following models of random (di)graphs.
• For p ∈ [0, 1), G(n, p) is the random graph with vertex set [n] in which every edge is included independently with probability p.
• For d ∈ N, G(n, d) is the random d-regular simple graph with vertex set [n] chosen uniformly at random among all such graphs.
• For p ∈ [0, 1), D(n, p) is the random digraph with vertex set [n] in which every oriented edge is included independently with probability p.
For an undirected graph G = (V, E) and u, v ∈ V we write dist G (u, v) for the minimum number of edges in a path from u to v, or set dist G (u, v) = ∞ if there exists no such path. The diameter of G is then defined just as in (1). Bollobás and Fernandez de la Vega [10] studied the diameter of G(n, d) and showed that for every integer r 2, we have diam(G(n, r + 1)) = (1 + o p (1)) log r n .
The diameter of G(n, p) was recently studied by Riordan and Wormald [33], who showed that for every constant r > 0, we have diam(G(n, r/n)) = (1 + 2η r + o p (1)) log r n .
In fact, they proved a stronger result, showing convergence in distribution of the diameter after appropriate recentering and rescaling. Although, in this paper we only determine the first order asymptotic behaviour of diam(D(n, r)), it is possible that similar techniques as the ones presented in [33] could be of use to determine its second order term. The extra term 2η r is essentially due to the existence of "remote" vertices in the giant component of G(n, r/n), whose neighbourhoods are exceptionally small up to distance about η r log r n.
Our result on the diameter of D(n, r) from Theorem 1 can be related to (3) and (4) in the following way. Given u, v ∈ [n], one way to determine dist D(n,r) (u, v) is to perform an outward breadth-first search (BFS) starting at u, to perform an inward BFS (i.e. following edges from head to tail) starting at v, and to stop at the first time the two searches uncover a common vertex. (See Section 5.1 for a careful definition of breadth-first search.) This technique was used by Bollobás and Fernandez de la Vega in [10]. Since the BFS explores vertices in order of distance, such a procedure is guaranteed to build a shortest path from u to v.
On the one hand, in the outward BFS of D(n, r) starting from u, every vertex has exactly r out-edges when explored. Similarly, in a BFS exploration of G(n, r + 1), when a vertex v is discovered via an edge from one of its neighbours, this leaves r edges to unveil when v is itself explored (unless v is discovered multiple times, which at least at the start of the BFS is unlikely). Thus, a BFS of G(n, r + 1) looks similar to an outward BFS of D(n, r).
On the other hand, in the inward BFS of D(n, r) starting from v (or at least near the start of the process) the number of in-edges arriving at a vertex are roughly distributed as a Binomial random variable with n trials and success probability r/n. Thus, a BFS of G(n, r/n) looks similar to an inward BFS of D(n, r).
The preceding paragraphs suggest that shortest paths in D(n, r) are in some sense hybrids of shortest paths in G(n, r + 1) and in G(n, r/n). This, together with (3) and (4), provides some intuition for the value of the diameter of D(n, r) from Theorem 1: it is the average of the limit values in those formulae.
An undirected model which is closely related to D(n, r) is the random r-out graph, obtained by selecting a directed graph according to D(n, r) and then forgetting about the directions of the edges. This model was introduced by Fenner and Frieze [17] and its diameter is whp (1 + o p (1)) log 2r n (see [18,Exercise 16.5.3]).
There is interesting related work on distances in randomly edge-weighted graphs. We mention in particular the paper of Janson [21] on typical and extreme distances in randomly edge-weighted complete graphs, and the subsequent work by Bhamidi and van der Hofstad [9], which establishes distributional convergence for the diameter.
To conclude this section, we discuss the stationary distribution of a simple random walk in these other models. While in undirected graphs the stationary distribution (if it exists) is completely determined by the degrees of the vertices, this is not the case in directed graphs. Cooper and Frieze [14] give a very precise description of the stationary distribution of D(n, c/n) when c = c(n) > (1 + ) log n, for any constant > 0, and use their result to compute the cover time of D(n, c/n). It is worth noticing that for such values of c, both the in-degrees and out-degrees are of logarithmic order and concentrated around their expected values, which turns to be very useful for the analysis. It seems harder to find an interesting question about the stationary distribution of D(n, c/n) when c = c(n) < (1 − ) log n since like in random r-in regular digraphs, typically there are vertices with no out-edges.
For any two random variables X and Y , we use the notation X d = Y to denote that the corresponding probability distributions are equal. For random variables X, Y , we write X Y , and say X is Given u ∈ [n] and an integer k

Concentration Inequalities
We write Bin(N, p) to denote a Binomial random variable with N trials and success probability p. We write Po(r) to denote a Poisson random variables with parameter r.
We also write Ber(p) to denote a Bernoulli random variable with success probability p.
We will use the following version of Chernoff's bound for large deviations that can be found in [22]. and We will also use Chebyshev's inequality: for any random variable X and any t 0,

Trees and branching processes
In any rooted tree, we view edges as oriented from child to parent. Fix a rooted tree T A plane tree is a rooted tree in which the children of each node have a left-to-right order. Given a plane tree T , there is a canonical labelling of V (T ) by distinct elements of {∅} ∪ i 1 N i , as follows. The root v has label ∅; its children are labelled from left to right as 1, . . . , Conversely, given a rooted tree T with t vertices and an ordering of We view V (T ) as a plane tree using the convention that the children of each vertex are listed from left to right in increasing order of index. If V (T ) ⊂ N then we always use the ordering inherited from N. Thus, for a rooted tree T with V (T ) ⊂ N, and a plane tree T , we say T and T are isomorphic, and write T ∼ = T , if T and T are identical when viewed as plane trees.
Finally, fix a non-negative, integer-valued random variable ξ. A Galton-Watson tree with branching mechanism ξ is the random, potentially infinite family tree T ξ of a branching process started from a single individual, in which each individual reproduces independently according to ξ (i.e. the number of offspring of each individual has the distribution of ξ). The random tree T ξ is naturally viewed as a plane tree; see [25] for details and a careful construction. If ξ is Po(r) distributed we call T ξ a Poisson(r) Galton-Watson tree.

Proof of Theorem 1
In this section we describe our proof technique for Theorem 1, and prove it assuming some technical estimates. We prove such estimates in Sections 8 and 9.

Upper bound
This subsection sketches the proof of our upper bound on the diameter of D(n, r). For the remainder of this subsection, let D = D(n, r) and write d − In order to derive an upper bound on the diameter of D we first show in Lemma 4 that for any fixed vertex v, the in-neighbourhood N − k (v) is either empty or large, for some k slightly larger than η r log r n, whp. Then, in Proposition 5, we show that conditional on N − k (v) being large (and a set of other technical conditions), the distance between any vertex u and N − k (v) is at most log r n, whp. Putting both things together, we are able to prove the upper bound for diam(D).
Then for every ∈ (0, 1/10), there exists δ > 0 such that , if all edges leaving v are self-loops) then call v a loop vertex. Let E SL be the event that D contains some loop vertex. Each vertex is independently a loop vertex with probability n −r , so Note that if r 3, the probability of a given vertex being a loop vertex is O(n −3 ). This bound is small enough that it would allow union bounds over pairs of vertices, which would simplify some proofs. Since we aim to prove our result also in the case r = 2, we need to be a bit more careful in our computations.
the preceding bound holding uniformly over k and over all H satisfying the above conditions.
We prove Lemma 4 and Proposition 5 in Section 8. In the remainder of the section, we finish the proof of the upper bound from Theorem 1, assuming Lemma 4 and Proposition 5.

Lower bound
This subsection sketches the proof of our lower bound on the diameter of D(n, r). Together with the proof in Subsection 4.1, this concludes the proof of Theorem 1.
In order to derive a lower bound on the diameter of D we first introduce the concept of flags; that is, vertices v which have atypically small but non-empty in-neighbourhoods N − k (v) that induce a tree in D, for values of k slightly smaller than η r log r n (see Definition 6). In Lemma 7 we show that the probability that a given vertex is a flag is relatively large, and in Corollary 8 we show that whp there exists many flags. Moreover, there are no flags outside of D 0 (Lemma 9). Using the existence of flags in D 0 , we conclude the lower bound for diam(D).
Let us now precisely define the notion of flag.
is a tree means that along any shortest path from N − k 1 (v) to v, at each node w there are (r − 1) possible "wrong turns" that lead to [n] \ N − k 1 (v). We will use this when bounding π min in Section 10.
In order to find a vertex v ∈ [n] such that there exists u ∈ [n] with dist(u, v) (1 + η r − ) log r n, we will look at F ( ). The following lemma shows that the probability a given vertex is an -flag is relatively large. Its proof uses the same ideas on Poisson branching processes also displayed in the proof of Lemma 4.
To ascertain that D(n, r) will contain -flags whp, we proceed as follows. In Lemma 7 we showed that the expected number of -flags goes to infinity as n → ∞. Then, we obtain the following upper bound on the probability that two vertices are simultaneously -flags (Corollary 28), The previous inequality allows us to use a second moment argument to prove the following: the electronic journal of combinatorics 27(3) (2020), #P3.28 Proof. By Lemma 7 and linearity of expectation there is β > 0 such that E(|F |) = nP(1 ∈ F ) n β . Next, by (8), for n large we have The result follows by Chebyshev's inequality.
Once we know that whp there are many -flags, we prove that they belong to the unique attractive component.
We use the previous results to conclude the proof of the lower bound.
Proof of Theorem 1 (Lower Bound). Fix > 0 and write Writing j 0 = inf{j : V (D 0 ) ⊂ N − k * +j (u)}, it follows that (r + /2) j 0 log 9 r n n/2. Provided is chosen small enough, for n large this implies that j 0 (1 − /2) log r n, so there is some node v ∈ V (D 0 ) with dist(v, u) k * + (1 − /2) log r n = (η r + 1 − ) log r n. Altogether, this yields The first two probabilities were shown to tend to 0 in [19]. The third tends to 0 by Corollary 8, and the fourth by Lemma 9. Proposition 16, which provides an estimate on the growth of the in-neighbourhoods in D, shows that the last tends to 0. As > 0 was arbitrarily small, the lower bound on diam(D 0 ) follows; since diam(D) diam(D 0 ) so does the lower bound on diam(D).

Breadth-first search and conditioning
In this section we describe the breadth-first search (BFS) procedures, which are fundamental to our analysis, and use them to prove a handful of stochastic domination results for neighbourhood sizes in D(n, r).

Outward and inward breadth-first search
Fix a digraph D together with an ordering of its vertices V (D) as (v 1 , . . . , v n ). The outward breadth-first search (oBFS) starting from node v ∈ V (D) is a deterministic process Begin with R + 0 = ∅ and S + 0 = (v). Now fix i 0 and suppose (R + i , S + i ) are already defined.
Step i of the process is defined as follows. If S + i = (s i,1 , . . . , s i,j ) has positive length then write u in increasing order of index as w i,1 , . . . , w i,k ; it is possible that k = 0. Then set In words, at step i, u + i = s i,1 is explored, and w i,1 , . . . , w i,k are discovered and added to the back of the queue for later exploration. If S + i has zero length (i.e., S + i = ()), then  ) are defined in just the same manner but exploring in-neighbourhoods rather than out-neighbourhoods to discover vertices; in . Observe that using the notation from Section 3.3, we have

Conditioning on neighbourhoods and the BFS exploration in D(n, r)
We next describe the effect of iBFS on the law of D(n, r). For the remainder of the section we write D = D(n, r) and fix v ∈ [n]. We write (D, v, m), etcetera. Informally, the point of this section may be summarized as follows: if vertex u − i is discovered at step j then all we know about u − i is that it has an edge to u − j and has no edges to u k for k < j. We mostly focus on the iBFS process. First, in Lemma 10 we control the number of edges from a vertex to the set of vertices already discovered by the process. We use these results to bound the in-degree of an undiscovered vertex to an in-neighbhourhood (Corollary 11), to control the one-step growth of in-neighbhourhoods (Corollary 12), and to bound the out-degree of a discovered vertex to the undiscovered set (Corollary 13). In Lemma 14, we show that the iBFS process is unlikely to hit the vertices of an already small exposed part of the digraph. Finally we state Lemma 15 which is the analogue of Lemma 10 for the oBFS process.
We now state and prove some useful stochastic identities and inequalities which result from this.
Since 0 j < m, and |R − m | = m, the second claim follows. The argument when w = v is similar but easier. Finally, the independence asserted by the lemma follows from the independence of the random variables (L w,p : (w, p) ∈ [n] × [r]).
For the next corollary, recall that d Corollary 12. For all j, q, p ∈ N 0 , given that d − Bin r(n − p), q n−p+q . Proof . By Corollary 11, the number of edges from w to N − j then has conditional law Bin(r, q/(n − p)), so is non-zero with probability , x). To see this, observe that the former is the number of columns containing at least one 1 in an r × m matrix whose entries are iid Ber(x) random variables, while the latter is the law of the number of ones in such a matrix. The pigeonhole principle then yields the second claim of the lemma.
is the set of vertices discovered at step i of iBFS.
We omit the proof since it is very similar to those given above. The next lemma formalizes the intuitively clear picture that it is unlikely for the early stages of iBFS to encounter a fixed, small subgraph of D not containing the starting vertex.
Proof. Since |R − i | = i, we clearly have i 0 s + 1. Writing τ = inf{t : (R − i ∪ S − i ) ∩ V (G) = ∅}, the probability we aim to bound is thus at most the heads of such edges are uniformly distributed over [n] \ V (G). For i > 0, given the heads of these edges are uniformly distributed over

since, in this situation, the only way that (R
. Using this bound in the above sum, the result follows.
Finally, we require the following, rather simple result for oBFS. Proof. We omit the proof of the first inequality, which parallels that of Lemma 10. For the second, note that |R + m ∪ S + m | rm + 1 since D is r-out regular.

In-neighbourhoods: technical lemmas
In this section we gather a few basic estimates that describe the size and structure of in-neighbourhoods of vertices in D = D(n, r). Proposition 16 shows that in-neighbourhoods of D(n, r) cannot be too large. Lemma 17 controls the probability that the sequence (d − k (v), k 1) exhibits a large decrease in value for relatively small values of k. In the last part of the section, we focus on the probability that an in-neighbourhood neither exponentially expands nor dies out quickly. Lemma 18, shows that local in-neighbourhoods of a vertex are well-approximated by Poisson Galton-Watson trees. We use this lemma with a result of Riordan and Wormald (Lemma 19) to control the probability that long and thin in-neighbourhoods exist in Proposition 20.
Proposition 16. For all α > 0, Proof. Fix v ∈ [n]. We prove that a union bound over v ∈ [n] then proves the proposition.
By Corollary 12, for every p q and every a 0, Note that r(n−p)· q n−p+q rq. Set a = (r+α) j log 2 r n, and observe that if E j−1 occurs then d − j−1 (v) = q q 0 := (r + α) j−1 log 2 r n. Finally, for such q we have a = (r + α)q 0 (r + α)q, so where we used the Chernoff bound (5). Using this bound in (10) proves (9).
. Uniformly in k 0.99 log r n and ω log 3 r n, we have Proof. Fix k and ω as above, and v ∈ [n]. Let τ = min{j : Now fix α > 0 small enough that (r + α) k log 2 r n < n 0.999 for n large; this is possible by our choice of k. Also, for ∈ N let E = {∀i , d − i (v) < (r + α) i log 2 r n}, and let E = 1 E . By Proposition 16, we have P(E) = O(n −4 ), so for all 1 j k − 1, On E we have d − (v) < (r + α) +1 log 2 r n (r + α) k log 2 r n < n 0.999 . Now fix 0 < q p n 0.999 and consider the random variable X distributed as the last inequality for n large since p n 0.999 . Now write q 0 = ω 2 /k and p 0 = (r + α) +1 log r n < n 0.999 . By Corollary 12 and the Chernoff bound (6), we have in the last line using that r 2. Finally, q 0 = ω 2 /k log 2 r n, so e −q 0 /(18r+2) = O(n −4 ). Combining the preceding inequality with (11), (12) and (13) yields where T is a Galton-Watson branching tree whose offspring is Poisson with parameter r.
Proof. Fix k ∈ N 0 and a plane tree T of height at most k, and write t = |V (T )|. Recall the canonical labelling of V (T ) with labels from ∅ ∪ i 1 N i introduced in Section 3.3. Consider the iBFS procedure on T started at its root v. To make sense of this, we must specify the order in which the children of a vertex u are added to the set of discovered vertices. We use the left-to-right order: so if u is explored at , and we let s = |V (T k−1 )| be the number of vertices of T at distance at most k − 1 from the root. In order to check if T k (D, v) and T are isomorphic, it suffices to perform s steps of the iBFS exploration from v in D. We then have where the last line is due to the symmetry of the model. Writing q i = 1 + i−1 j=0 (a j − 1), by Corollary 13 we then have . the electronic journal of combinatorics 27(3) (2020), #P3.28 Now let T be a Poisson(r) Galton-Watson tree; write ρ for the root of T . Build T via iBFS starting from ρ. In this manner, we may couple T with a sequence (ξ i , i 0) of iid Po(r) random variables so that for 0 i < |V (T )| we have |C − i (T , ρ)| = ξ i . It follows that Using that 1 + x e x , this gives Since i + q i t n/2, s−1 i=0 a 2 i t 2 , and s t, we have It follows that

Lemma 19 ([33]
, Lemma 2.1). Let T be a Poisson(r) Galton-Watson tree. There exist constants c, C > 0 such that for every ω 2 and k 1 we have where k = log r ω .
Recall that the probability of survival in T is P k 0 |T k | = ∞ ∈ (0, 1). Essentially, the preceding lemma states that given the branching process survives for the first k generations, the probability that |T k | < ω decays exponentially in k (provided that ω is small enough with respect to k). The final and principal result of this section is to prove a corresponding bound with d − k (v) in place of |T k |.
Proposition 20. For every v ∈ [n], k 0.99 log r n and log 3 r n ω n 1/6 where k = log r ω .
Proof. Write Z k = |T k |. We first prove an upper bound on P(0 < d − k (v) < ω). By Lemma 18, where in the last inequality we used Lemma 17.
We now turn to the lower bound. A similar argument to that above gives Bounding the second probability is straightforward. First, fix j < k and a, b ∈ N. Given that Z j = a, by the branching property, each of the a subtrees of T rooted at a node in T j survives independently with probability p := P(|T | = ∞). Note that p > 0 is independent of n. But Z k is at least the number of such subtrees which survive, so . In other words, letting j 0 = inf{j : Z j ω 2 /(k + 1)}, we must have j 0 k. It follows from the preceding paragraph (conditioning on the value of j 0 k) that P( k j=0 Z j ω 2 , Z k < ω) P(Bin(ω 2 /(k + 1), p) < ω) = O(n −3 ), the final inequality by a Chernoff bound since ω 2 /(k + 1) ω log 2 n. The proposition follows from Lemma 19.

Out-neighbourhoods: technical lemmas
Recall that E SL is the event that D contains no loop vertices. As in the statement of  Next, independently for each w / ∈ V (H), letL w = (L w,1 , . . . ,L w,r ) be a vector chosen uniformly at random from then r vectors (s 1 , . . . , s r ) ∈ (([n] \ V (H)) ∪ B H ) r . Then for each i ∈ [r] add a directed edge from w toL w,i .
Finally, for w ∈ V (H), let t w = r − |E H (w, H)|; this is the number of edges with tail w and head not in H. Independently for each w ∈ V (H), letL w = (L w,1 , . . . ,L w,tw ) be a vector chosen uniformly at random from ([n] \ V (H)) tw , and for each i ∈ [t w ] add a directed edge from w toL w,i .
For the remainder of the section, fix u ∈ [n] and write N * We continue with a simple lemma.
In this case, since r 2, it is a simple combinatorial exercise to check that if also d * 5 < 5 then D[N * 5 ] has at least two more edges than vertices. For any fixed digraph D with at least two more edges than vertices and with no self loops, it is easily seen that P H (D . (This is not true for digraphs with self-loops if r = 2; the probability v itself is a loop vertex is O(n −2 ) and in this case d * 5 = 0.) The number of isomorphism classes of digraphs with diameter at most 5 and maximum out-degree r is bounded, and the result follows.
We next show that with high probability, each generation N * j is approximately r times larger than the last, until j is nearly (log r n)/2.
Now fix 5 i < j. Condition on N * i , and recall that the random variables {L w,m : w ∈ N * i , m ∈ [r]} are the heads of edges from vertices in N * i . Reveal the values of these random variables one-at-a-time; say a conflict occurs if L w,m ∈ N * i ∪ V (H) or L w,m = L w ,m for a previously revealed L w ,m . If d * i+1 rd * i − 4 then at least 4 conflicts occur. Under P H , the random variables L w,m are independent and uniform over ([n]\V (H))∪ B H . When L w,m is revealed there are less than r i+2 + |B H | locations that can cause a conflict, since |N * i+1 | < r i+2 , so the probability of a conflict is less than (r i+2 + |B H |)/n. The set {L w,m : w ∈ N * i , m ∈ [r]} has size at most r i+1 , and |B H | log 7 n; it follows that where in the last inequality we used thatn n − log 7 n. For j (log r n)/8 we thus have The third lemma of the section shows that out-neighbourhoods continue to grow rapidly until they reach size close to n/ log n.
Lemma 24. There is C > 0 such that for all i with r i n/ log r n − 2 log 7 n, Proof. We have Condition on N * i , and reveal the random variables {L w,m : w ∈ N * i , m ∈ [r]} one-ata-time as in the previous proof. When L w,m is revealed there are less than r i+2 + |B H | locations that can cause a conflict, so under P H the probability of a conflict is at most (r i+2 + |B H |)/n. If d * i+1 rd * i − t then at least t conflicts occur, so we obtain Using that r i n/ log r n − 2 log 7 n and that |B H | log 7 n andn n − log 7 n, it is straightforward to verify that ar(r i+2 + |B H |)/n r 3 a/ log r n. A Chernoff bound then gives for some constant C = C (r). The latter inequality follows since a log 3 r n . The following is an easy consequence of the preceding lemma, and concludes the section.
Corollary 25. Let j * = 3 log r log r n + 5 and let * = log r n − log r log r n − 1. Then there are c, C > 0 such that log 3 r n. Since * − j * = log r n − 4 log r log r n − 6 we also have 1 − 2r 2 log r n * −j * r * −j * log 3 n 1 − 2r 2 log r n log r n−1 n r 6 log 4 r n · log 3 r n e −2r 2 r 6 · n log r n , With c = e −2r 2 2r 6 , it follows from the preceding inequalities that if d * j * log 3 r n but d * * < cn/ log r n then there is i ∈ [j * , * − 1] such that d * i log 3 r n and d * i+1 < rd * i · 1 − 2r 2 log r n . By Lemma 24, there exists some C such that for some constant C < C .

Upper bound on the Diameter
In this section we prove Lemma 4 and Proposition 5 from Subsection 4.1. Throughout the section, we fix u, v ∈ [n].
Recall that k * = (η r + ) log r n and that * = log r n − log r log r n.
Combining the two preceding bounds, the lemma follows.
The proof of Proposition 5 occupies the remainder of the section. Let τ = min{j 1 : Proof. First, where the sums are over graphs F with V (F ) ∩ V (H) = ∅, such that u ∈ V (F ) and such that, for some > 0, V (F ) = j=0 N + j (F, u) and = min{i : |N + i (F, u)| n/ log n}. We now bound the final probability. Under such conditioning, the out-edges from N + τ are uniformly distributed over ([n] \ V (H)) ∪ B H . There are more than r( n/ log n) such out-edges; to have τ > τ + 1 the heads of such edges must all avoid B H ; so 1 − log 4 n n r(n/ log n) , the last inequality since |B H | log 4 n andn < n. Using that 1 − x e −x this gives The result follows.
the electronic journal of combinatorics 27(3) (2020), #P3.28 Proof of Proposition 5. Recall that we set * = log r n − log r log r n − 1, and the notation Once n is large enough that * + 1 > 5 we have , and we focus on the latter probability. It is convenient to use the shorthand Now take ∈ (0, c), where c is the constant from Corollary 25. and let τ be as in Lemma 26. Then by that lemma, We bound the second probability by The first term on the right is at most e −C log 2 n by Corollary 25. We further divide the second as if also d * 5 5 then 5 < σ j * . Lemma 23 then implies that the first probability on the right is O(n −3 ). By the definition of E and by Lemma 22, the second probability is also O(n −3 ). Combining all these bounds we obtain , as required.
Proof of Lemma 7. We assume n large throughout. Given a tree T , let k 1 (T ) = inf{k : |T k | log 4 n}, let A(T ) be the event that |T k * | ∈ [ log 3 n, log 3 n], let B(T ) be the event that max i k * |T i | log 6 n, and C(T ) be the event that k 1 k * + 5 log r log n and |T k 1 | log 5 n. (We may view a deterministic tree as a random tree in the same way as we may view a constant as a random variable, so it is reasonable to call A(T ), B(T ) and C(T ) events even if T is deterministic.) We first bound the probability that A, B and C occur for a Poisson(r) Galton-Watson tree T .
Let ω = log 3 n and k 0 = log r ω. By the Kesten-Stigum theorem (see e.g. [4, pp. 24-29]) r −k |T k | converges almost surely to an absolutely continuous random variable on (0, ∞). As it also converges in probability, for every fixed 0 < c 1 < c 2 , inf k P(c 1 r k |T k | c 2 r k ) > 0, the electronic journal of combinatorics 27(3) (2020), #P3. 28 where the infimum is taken over all k such that [c 1 r k , c 2 r k ] ∩ N is non-empty. Taking c 1 = and c 2 = 1, there exists c 0 > 0 such that By Lemma 19 with ω = 2 and k = k * − k 0 , there is a > 0 such that where we used that log r (r(1 − λ r )) = −η −1 r and that η r < 1. We conclude that Next, if B(T ) occurs then let i k * be minimal such that |T i | > log 6 n. In order for A(T ) to additionally occur the number of descendants of T i alive at time k * must be less than log 3 n. Writing p for the survival probability of a Poisson(r) branching process, it follows as in the proof of Proposition 20 that P(A(T ), B(T )) P(Bin(log 6 n, p) log 3 n) n −3 , the last inequality by a Chernoff bound.
To bound the probability of C(T ), let N = N (T ) be the number of vertices in T k * with at least one descendant in T k * +5 log r log n ; if T k * = ∅ then N = 0. If C(T ) does not occur then one of the following must occur.
(b) N log 2 n but k 1 > k * + 5 log r log n.
If A(T ) occurs then |T k * | log 3 n, so by the branching property (i.e. the independence of subtrees rooted at elements of T k * ), we have P(A(T ), N < log 2 n) P(Bin( log 3 n, p) log 2 n) < n −3 for large n, by a Chernoff bound. Next, to have k 1 > k * + 5 log r log n, every vertex in T * k must have fewer than log 4 n descendants in T k * +5 log r log n , so by Lemma 19 and the branching property we have P(N log 2 n, k 1 > k * + 5 log r log n) (P(|T 5 log r log n | ∈ (0, log 4 n))) log 2 n C(r(1 − λ r )) 5 log r log n−log r log 4 n log 2 n C(r(1 − λ r )) log log n log 2 n n −3 for large n, the last inequality because r(1 − λ r ) < 1. Finally, by the Markov property and the definition of k 1 , writing Po(t) for a Poisson(t) random variable, we have P(|T k 1 | > log 5 n) sup m<log 4 n P(Po(rm) > log 5 n | Po(rm) log 4 n) P(Po(r log 4 n) > log 5 n − log 4 n) .
The result follows.
The next lemma will be our key tool for controlling the joint probabilities of inneighbourhoods of distinct vertices. Results of this type are standard in sparse random undirected graphs.
Proof. Recall that T i is the i-th generation of tree T . Write h and h for the respective heights of T and T , and t and t for their respective sizes. In order that D[N − h (u)] = T , it is necessary and sufficient that the following events occur.
• For each x ∈ V (T ) \ {u}, there is an edge from x to p T (x) in D; call this event A 1 (u, T ).
• There are no other edges within D[V (T )]; call this event A 2 (u, T ).
• There are no edges from [n] \ D[V (T )] to V (T ) \ T h ; call this event A 3 (u, T ).
Note that A 3 is independent of A 1 and A 2 , so we have We now consider two such events simultaneously. Observe that if T has root v and height h , and We thus have Given that A 1 (u, T ) and A 2 (u, T ) occur, there are precisely 1+(r−1)t edges leaving V (T ), and the heads of these edges are uniformly distributed over [n] \ V (T ). The conditional probability no such edges have head in V (T ) \ T h is Similar considerations for edges leaving V (T ) and edges with tail in [n] \ (V (T ) ∪ V (T )) yield the identity .
Combined with (14) and (15), straightforward arguments give The following corollary provides a proof for Equation (8).
the electronic journal of combinatorics 27(3) (2020), #P3. 28 We now turn to the case that N − k 1 (u) (u) and N − k 1 (v) (v) are disjoint. We have Together with (16) this completes the proof.
We finally prove Lemma 9.
Proof of Lemma 9. Fix > 0 and write F = F ( ). If D 0 is attractive then with high probability every vertex v with max u∈[n] dist(u, v) < ∞ satisfies v ∈ V (D 0 ). Since D 0 is attractive whp [19], in order to show P(F ⊂ V (D 0 )) = 1 − o(1), it suffices to show that whp, for all v ∈ F and u ∈ [n] we have dist(u, v) < ∞.
It follows that We bound the inner sum by writing the final bound because a sum of probabilities of disjoint events is at most one. Now fix T ∈ T and write h = h(T ) for the height of T (i.e., the greatest number of edges on a path ending at the root v). Observe that if D[N − k 1 (v)] = T then k 1 = h(T ). We thus have the equality of events . Using this bound, the result follows from the two preceding inequalities and the fact that P(E SL ) = Θ(n 1−r ) = O(n −1 ).

The Stationary Distribution
In this section we prove Theorem 2. Recall that D 0 = D 0 (n, r) is the largest strongly connected component of D and that with high probability D 0 is attractive [19] and ergodic [5]. Write π max = π max (D) and π min = π min (D). Also, write X = (X k , k 0) for simple random walk on D = D(n, r).
It is important to distinguish the randomness of the graph D from that of the walk X. For v ∈ V (D) = [n], write P v for the (random) probability measure under which X has the law of simple random walk on D with X 0 = v, and E v for the corresponding expectation operator. It is handy to have a concrete description of X under P v , as follows.
Recall that D has edges {(i, L i,j ), (i, j) ∈ [n] × [r]} (this is the "canonical construction" from the introduction). Let (U k , k 0) be independent and uniformly distributed over {1, . . . , r}. Then set X 0 = v and for k 0 let X k+1 = L X k ,U k . Suppose that the random walk follows an edge e from N − k (v) to its complement. After following the edge, the random walk's position has distance greater than k from v. Since the distance to v decreases by at most one in a single random walk step, this means that in order to reach v after leaving N − k (v), the random walk must pass through N − k (v): it must restart at one of the maze entrances.

Bounding π max
With the preceding paragraph in mind, for any positive integer h we say that Perhaps more picturesque: the maze is h-hard if no matter what entrance is chosen, along any potential path to the treasure there are at least h locations where only a single direction stays within the maze; the other (r − 1) possibilities deposit the searcher outside of the maze walls. For S ⊂ [n] let τ S = inf{k 0 : X k ∈ S} and let τ + S = inf{k > 0 : X k ∈ S}.
Proof. If the maze is h-hard then from any u ∈ N − k (v), To see this, simply note that in order to have τ v < τ [n]\N − k (v) the walk must visit at least h vertices w ∈ N − k (v) with |E(w, N − k (v))| = 1. But for such a vertex w we have P w X 1 ∈ N − k (v) = 1/r and the inequality follows by the Markov property. We now use that Let K be the number of visits to N − k (v) before the walk visits v. Since the inequality (17) holds for all u ∈ N − k (v), it follows that for all w ∈ and the result follows. Proof. Choose α > 0 small, and let A be the event that for all k 0 and all v ∈ [n] we have d − k (v) (r + α) k log 2 n. By Proposition 16, we have P(A) = O(n −4 ). Assuming α is small enough with respect to δ, we also have Observe that there may be multiple edges from w to a vertex u ∈ N + (w), so Y (w) may not equal |N is not h-hard) is small it thus suffices to show that with high probability no such path exists.
We would like to conclude as follows. Let S be any path from T − * (the last generation of T − * ) to v. The arguments of the preceding paragraphs suggest the bound P(B(S) |S| − h | A) P(Bin(|S|, rn −δ/2 ) |S| − h) 2 |S| (rn −δ/2 ) |S|−h .
On A we have |T − * | < |T − * | n 1−δ/2 , so there are less than m · r t paths of length t from T − * to v. Now use the preceding inequality and a union bound over paths of length t and over t * . To make the preceding argument rigorous, we need to deal with the fact that the set of paths from T − * to v are random (even conditional on T − * , as such paths may follow edges of D[N − * (v)] which are not edges of T − * ). To do so, condition on T − * , fix w ∈ T − * = N − * (v) and a string s = s 1 s 2 . . . s t ∈ [r] t of length |s| = t. This string uniquely specifies a path P = P (w, s) = (p i (w, s), 0 i t) in D: at step i follow the s i -th edge leaving the current vertex. Formally, we let p 0 = w and, for 1 i t, let p i = L p i−1 ,s i .
By repeated conditioning, we obtain P(P (w, s) is a simple path in D[N − * (v)], B(P (w, s)) t − h | T * , A) P(Bin(t, rn −δ/2 ) t − h) 2 t (rn −δ/2 ) t−h . Now let A w (t) be the event that there is a simple path P of length t starting from w and staying within D[N − * (v)], for which B(P ) t − h. All possible such paths are described by a string s ∈ [r] t , so by the preceding inequality and a union bound, Since * − h δ log n, this yields that Before proving our bounds on π max we require one final result, which states that with high probability there is at least one escape route along each path from N − log log n (v) to v, for all v. Note that the event E v is not the event that D[N − log log n (v)] is 1-hard: in E v we require the vertex w to send the searcher not outside of D[N − log log n (v)] but rather out of the larger maze D[N − * (v)]. The proof of Lemma 32 follows the same lines as that of Proposition 31 but is simpler, and is omitted.
First, note that if N + (v) \ N − log log n (v) = ∅ then P v X 1 ∈ N − log log n (v) 1/r. We have

Bounding π min
We bound π min from below using the following lemma.
Lemma 34. Let D be any r-out regular digraph. If D is ergodic and has diameter diam(D) d, then π min 1 1 + dr d .
Proof of Theorem 2. The theorem is now an immediate consequence of Theorems 33 and 35.