Upper tail bounds for Stars

For r \ge 2, let X be the number of r-armed stars K_{1,r} in the binomial random graph G_{n,p}. We study the upper tail \Pr(X \ge (1+\epsilon)\E X), and establish exponential bounds which are best possible up to constant factors in the exponent (for the special case of stars K_{1,r} this solves a problem of Janson and Rucinski, and confirms a conjecture by DeMarco and Kahn). In contrast to the widely accepted standard for the upper tail problem, we do not restrict our attention to constant \epsilon, but also allow for \epsilon \ge n^{-\alpha} deviations.

In words, for stars the form of the upper tail changes around p ≈ n −1/r , which is an intriguing phenomenon (that does not occur for cliques). In fact, a recent conjecture of DeMarco and Kahn [7,28] for general H suggests that in (2) the 'correct' exponent involves yet another term µ := EX K1,r = Θ r (n r+1 p r ), see (3). However, despite some partial results [26,27,33], the quest for matching bounds in (1)- (2) remained open.

Main results
Our first basic result settles the upper tail problem of r-armed stars for constant ε, by closing the existing log(1/p) gap in the exponent for all p ∈ (0, 1]. In particular, (3) below confirms 1 Conjecture 10.1 of DeMarco and Kahn [7] in the special case H = K 1,r . For subgraph counts this is the first example of a sharp upper tail estimate where, for constant ε, the form of the exponent undergoes multiple phases (i.e., has more more than two different expressions for different ranges of p).
Our second result determines the correct dependence of the stars upper tail on ε, up to constant factors in the exponent (this contrasts Theorem 1 above, where the implicit constants may depend on ε). In particular, (4) below solves Problem 6.1 of Janson and Ruciński [18] in the special case H = K 1,r . For subgraph counts this is the first example where, for p bounded away from one, the order of the large deviation rate function − log P(X ≥ (1 + ε)µ) is determined for ε = ε(n) of form ε ≥ n −α (the assumption Φ(ε) ≥ 1 is natural, since it ensures that we are dealing with exponentially small probabilities).
We now motivate the somewhat unusual form of the exponent in (4). First, normal approximation heuristics 3 suggest that P(X ≥ (1 + ε)µ) ≈ e −Θr((εµ) 2 /σ 2 ) for very small ε, and this sub-Gaussian tail is consistent with the ϕ(ε)µ 2 /σ 2 term in (5) since ϕ(ε) = Θ(ε 2 ) as ε → 0 (the function ϕ is well-known from Chernoff bounds). Second, in G n,p we usually expect to have at least (1 − ε)µ copies of K 1,r , say, so enforcing 2εµ extra copies via m = Θ r (max{(εµ) 1/r , (εµ)/n r−1 }) appropriately clustered 4 edges F should thus be enough to give a total of (1 + ε)µ copies of K 1,r ; this heuristic loosely suggests P(X ≥ (1 + ε)µ) ≥ Ω r (1) · P(F ⊆ G n,p ) ≥ Ω r (p m ) ≥ e −Or(m log(1/p)) . Intuitively, Conjecture 4 predicts that the form of the upper tail is indeed determined by either sub-Gaussian or 'clustered' behaviour, and Theorem 2 confirms this for ε = ε(n) ≥ n −α . Our third result approaches the upper tail problem from a conceptually slightly different perspective, studying P(X ≥ µ + t) for general deviations t (this contrasts Theorem 1 and 2 above, where we focus on the large deviations range t = εµ and then put restrictions on ε). For subgraph counts, inequality (6) below is the first example where, for moderately large edge-probabilities p, the order of − log P(X ≥ µ + t) is completely resolved for all exponentially small deviations (where t ≥ σ is the natural target assumption). We complement this result with inequality (7) below, which is the first example where the order of − log P(X ≥ µ + t) is resolved for nearly all deviations t where the 'clustered' behaviour determines the exponent (here t 2 /σ 2 ≥ M (t) log(e/p) is the natural target assumption for µ 1−1/r ≥ log n; see (5), Remark 3, and Conjecture 4).

Some comments
The main focus of this paper are upper bounds on the upper tail P(X ≥ (1 + ε)µ). Developing [31,33], here our high-level proof strategy is based on the idea that (after ignoring certain 'bad' events with negligible probabilities) using combinatorial arguments we can find a 'well-behaved' subgraph G 0 ⊆ G n,p in the sense that (i) the number of stars K 1,r in G 0 and G n,p are approximately the same (differ by at most εµ/2, say), and (ii) the maximum degree of G J is 'not too large' (which intuitively helps for showing concentration of stars). Using modern Chernoff-like upper tail bounds, we then show that it is very unlikely to have a 'bad' subgraph G ′ ⊆ G n,p with 'not too large' maximum degree and 'many' stars (at least (1 + ε/2)µ many, say).
Putting things together, the punchline is then that we can only have X ≥ (1 + ε)µ many stars if one of the discussed unlikely 'bad' events occur, which (after some technical work) eventually gives the desired upper bounds on the upper tail P(X ≥ (1 + ε)µ); see Section 2 for more details.
Finally, let us briefly compare our upper tail results for stars with very recent results from the large deviation theory literature, which are spearheaded by Chatterjee, Dembo, Lubetzky, Varadhan, Zhao, and many others (see, e.g., [5,22,4,23,10,1,8]). For general H, these aim at determining the asymptotics of − log P(X H ≥ (1 + ε)EX H ) for constant ε and large edge-probabilities of form p = Θ(1) or p ≥ n −δH . For stars H = K 1,r , inequality (4) from Theorem 2 is weaker in the sense that it only determines − log P(X K1,r ≥ (1 + ε)EX K1,r ) up to constant factors, but it is stronger in the sense that it covers a much wider range of the parameters, including ε = ε(n) ≥ n −α and all p = p(n) of interest. Obtaining such tail estimates with increased ranges of applicability is useful for combinatorial applications, where one is usually 'willing to give up a little bit on the tail', in particular on the 'inessential numerical constants' in the exponent (see [30,18]). Furthermore, estimates of form (6)-(7) are also quite satisfactory from a concentration inequality perspective. Overall, we hope that our results stimulate more research into such estimates for other graphs H.

Organization
In Section 2 we prove the upper bounds on the upper tail from Theorem 1, 2, and 5 (and discuss a simple extension). The corresponding (fairly routine) lower bounds are then established in Appendix A.

Upper bounds on the upper tail
In this section we establish the upper bounds on the upper tail P(X ≥ (1 + ε)µ) from Theorem 1, 2, and 5. Our core argument has two strands. In the first combinatorial part we iteratively decrease the maximum degree of the random graph G n,p = G J ⊇ · · · ⊇ G 0 by edge-deletion (the idea is to remove large stars K 1,Dj with D j ≫ r from G j ) until the final graph G 0 has sufficiently low maximum degree, say at most D. This degree bound allows us to estimate the number of stars K 1,r in G 0 via a 'well-behaved' auxiliary random variable X D . Taking into account the number of stars K r which are removed when passing from G n,p = G J to G 0 , this allows us (by means of a technical event T ) to approximate the number X = X r,n,p of copies of K 1,r in G n,p using X D and several further auxiliary random variables N Dj (which intuitively bound the number of K 1,Dj in G n,p ). In the second probabilistic part we then estimate the upper tails of these auxiliary variables using a concentration inequality of Warnke [31] and ad-hoc arguments (exploiting the careful definitions of the variables X D and N Dj given in Section 2.1). Putting things together, the core argument then proceeds roughly as follows: by the combinatorial part X ≥ (1 + ε)µ can only happen if at least one of the auxiliary variables X D or N Dj is 'large', and by the probabilistic part the probability of this 'bad' event is at most the desired 'correct' upper tail probability (for suitable choices of the degree constraint D and other parameters).
In Section 2.1 we first illustrate this argument for the simpler setup of Theorem 1, and in Section 2.2 we then extend the argument to the more precise tail estimates of Theorem 2 and 5. Finally, in Section 2.3 we also briefly discuss a straightforward extension (to a certain sum of iid variables).

Core argument for Theorem 1
We start by introducing the main random variables and events for Theorem 1 (as we shall see, their careful definitions will facilitate the interplay between the combinatorial and probabilistic parts of our argument). For x ≥ 0, let X x denote the maximum number of copies of K 1,r in any subgraph H ⊆ G n,p with maximum degree at most x. For y > 0, let N y denote the maximum size of any collection of edge-disjoint K 1,⌈y⌉ in G n,p . For β, D, t > 0 let T = T (β, D, t) denote the 'technical' event that where we tacitly used the following convenient parametrization: (In this subsection we shall only use t = εµ; working with general t is convenient for the later extensions.) The following combinatorial lemma is at the heart of our argument, and it intuitively states that X ≈ X D whenever the event T = T (β, D, t) holds. Its proof is inspired by ideas developed in [31,33], but contains several new ideas. For example, instead of iteratively sparsifying an auxiliary hypergraph (which encodes the edge-sets of all stars K 1,r in G n,p ) we here iteratively sparsify the random graph G n,p itself. Furthermore, in order to obtain the correct tail behaviour, in inequality (8) we need to work with M = max{t 1/r , t/n r−1 } instead of the simpler choice M = t 1/r suggested by [31] (we achieve this by adding an extra degree bound to the argument, bounding the initial maximum degree by M = min{M, n} instead of just M ).
The lower bound X ≥ X D of Lemma 6 is trivial. For the upper bound the idea is to iteratively decrease the maximum degree of G n,p , yielding G n,p = G J ⊇ · · · ⊇ G 0 . By bounding the number of K 1,r which are removed when passing from G j+1 to G j , this eventually allows us to estimate the total number of K 1,r .
Proof of Lemma 6. Define M := min{M, n}. Let J be the smallest integer J ≥ 0 with D J ≥ M . We set G J = G n,p and inductively construct G J ⊇ · · · ⊇ G 0 . Given G j+1 , 0 ≤ j ≤ J − 1, let C j+1 be a maximal set of edge-disjoint collection of stars K 1,⌈Dj ⌉ . We remove all edges from G j+1 which are incident to a centre vertex of some star in C j+1 , and denote the resulting graph by G j . Writing With G J ⊇ · · · ⊇ G 0 in hand, we now count the total number of copies of K 1,r in G n,p = G J . Note that, given an edge e = {v 1 , v 2 } of G j+1 with 0 ≤ j < J, we can construct any K 1,r in G j+1 containing e by first selecting a centre vertex v c ∈ {v 1 , v 2 } and then r − 1 additional neighbours of v c . Hence in G j+1 any edge is contained in at most 2 ∆j+1 Recalling the definition of N Dj , note that when, passing from G j+1 to G j , we remove at most N Dj ∆ j+1 ≤ 2N Dj D j edges. So, since G 0 contains at most X D0 = X D copies of K 1,r , using (8) and max 0≤j<J D j ≤ M it follows that Recalling D j = 2 j D and r ≥ 2, using M = min{M, n}, M = max{t 1/r , t/n r−1 } and β ≤ 1/32 we infer which completes the proof.
Applying Lemma 6 with t = εµ, in the probabilistic part of the argument it remains to estimate P(X D ≥ µ + εµ/2) and P(¬T (β, D, εµ)). We shall exploit the maximum degree constraint of X D via the following upper tail inequality of Warnke [31], which extends classical Chernoff bounds to random variables with 'well-behaved dependencies' (and allows us to go beyond the method of typical bounded differences [32]).
Theorem 7 (Corollary of [31,Theorem 9]). Let (ξ i ) i∈S be a finite family of independent random variables with ξ i ∈ {0, 1}. Given a family I of subsets of S, consider random variables Y α := i∈α ξ i with α ∈ I, and suppose α∈I EY α ≤ µ. Define Z C := max α∈J Y α , where the maximum is taken over all J ⊆ I with max β∈J |{α ∈ J : Then, for all C, t > 0, The main observation is that, in every subgraph H ⊆ G n,p with maximum degree at most D, any star K 1,r shares edges with O(D r−1 ) other stars. For X D this allows us to routinely apply Theorem 7 with Lipschitz-like parameter C = O(D r−1 ), making inequality (13) plausible. For Theorem 1 the crux is that our choice of D will ensure µ/D r−1 = Θ r (Φ), so (13) suggests that X D ≤ µ + εµ/2 fails with probability at most e −Ωr,ε(Φ) .
Corollary 8. For all n ≥ 1, p ∈ (0, 1] and D, t > 0 we have there is a subgraph H ⊆ G n,p with maximum degree at most ⌊D⌋ such that X D = α∈J Y α for J := K 1,r (H). Given β ∈ J , we construct all edge-intersecting stars α ∈ J as in the proof of Lemma 6, and infer It follows that X D ≤ Z C , where Z C is defined as in Theorem 7 with I = K 1,r (K n ). It is well-known (and easy to check by calculus) that for x ≥ 0 we have ϕ(x/2) ≥ ϕ(x)/4 and Putting things together, using Theorem 7 and (15) it follows that which completes the proof of (13) by choice of C (see (14) above).
We shall estimate P (¬T (β, D, εµ)) via a union bound argument and the following upper tail estimate for N Dj . The technical assumption (17) intuitively ensures that vertices with degree at least D are unlikely. For Theorem 1 the crux is that our choice of D will also ensure np/(eD j ) ≤ p Ω(1) , so applications of inequality (18) with x = βM/D j suggest that T and thus (8) fails with probability at most n · n −3 p Ω(M) ≤ n −2 p Ωε(Φ) , say.
Lemma 9. For all n ≥ 1, p ∈ (0, 1], and D > 0 satisfying the following holds. For all x > 0 we have Proof. As m z ≤ (me/z) z for all integers m, z ≥ 1, by exploiting the disjointness condition of N Dj we infer As the function x → (e 3 np/x) x is decreasing for x ≥ e 2 np, and (17) implies ⌈D j ⌉ ≥ D ≥ e 3 np, we deduce Plugging this into (19) readily establishes inequality (18), since trivially N Dj = 0 when D j > n.
For the proof of the upper bound of Theorem 1 it remains to pick suitable D, i.e., which satisfies the technical assumption (17) and yields the 'correct' exponent in (13) and suitable applications of (18). For later reference, we record that there is a constant d = d(r) > 0 such that, for n ≥ n 0 (r), By Lemma 6, the upper tail of the number X = X r,n,p of K 1,r -copies satisfies Gearing up to bound P(¬T (β, D, εµ)) via Lemma 9, using e = p γ e s and inequality (20) together with the bound s 1/(r−1) ≤ s = 1 + log p −γ ≤ p −γ (as 1 + x ≤ e x ) it follows that where here and below we shall always tacitly assume n ≥ n 0 (r, d) whenever necessary. Since the above calculation also gives D ≥ Anp 1−γ , together with D ≥ A it follows that Applying a union bound argument, using estimates (18), D j = 2 j D ≥ D, and (22) it follows that Recalling (21) and the definition of M = M (εµ), by applying Corollary 8 with t := εµ it follows that there is a constant c = c(β, A, γ, r) > 0 and suitable parameters ζ, Π > 0 such that We find the above upper tail estimate very satisfactory, but in the literature it has become standard to suppress multiplicative factors such as 1+n −2 in (24), which is straightforward when cζΠ ≥ 1 holds (rescaling the exponent cζΠ by a factor of 1/2, say). In the remaining case 1 > cζΠ Markov's inequality gives Finally, noting s = log(e/p γ ) ≥ log(1/p γ ) = γ log(1/p) then establishes the upper bound in (3).

Extension of the argument to Theorem 2 and 5
We now extend the arguments from Section 2.1 to the upper bounds of Theorem 2 and 5. To obtain sub-Gausssian decay ϕ(ε)µ 2 /σ 2 in the exponent of tail-inequality (13) for X D , in view of the well-known variance estimate σ 2 = Θ r,ξ ((1 + (np) r−1 )µ) from Remark 3 we here would like to pick D = Θ r,ξ (1 + np) for some range of t = εµ. However this choice causes a major problem: 5 in the key estimate (22) we can no longer win an extra log-factor (via np/(eD) ≤ e −s ) when we bound the N Dj variables using (18) from Lemma 9. Our strategy for overcoming this obstacle is to refine the technical event T = T (β, D, t), by enforcing different upper bounds on N Dj when D j = 2 j D is small (so that in the probabilistic arguments we automatically win an extra logarithmic factor, without destroying the combinatorial counting arguments from Lemma 6). Turning to the details, for γ, β, D, t > 0 let T + = T + (γ, β, D, t) denote the 'technical' event that N Dj < βM s D j for all j ∈ N with D j < min{M, n}/s 1/(r−1) , and (25) where, in addition to the parameters M = max{t 1/r , t/n r−1 } and D j = 2 j D from (9), we tacitly used s = s(γ) := log(e/p γ ).
We are now ready to prove the following slightly more general upper tail estimate for the number X = X r,n,p of K 1,r -copies in G n,p , which (as we shall see) implies the upper bounds in Theorems 2 and 5.
We now deduce the upper bounds of Theorem 2 and 5 from the upper tail inequality (28).
For Theorem 5 we shall simplify the form of the exponent in (28) via the following auxiliary result, writing a n ≍ b n instead of a n = Θ(b n ) for typographic reasons (the assumption p ≥ n −9 in (ii) is ad-hoc).

Straightforward extension to a certain sum of iid variables
We close this section by recording that minor (and in fact simpler) variants of our proofs also apply to the following sum of independent random variables: Indeed, in view of the structural similarities to the number of r-armed stars in G n,p (which satisfies X n,r,p =

v∈[n]
dv r , writing d v for the degree of v), here we set X x := i∈[n]:Yi≤⌊x⌋ Yi r , and define N x as the number of i ∈ [n] with Y i ≥ ⌈x⌉. Now the proofs of Lemma 6 and 10 carry over with minor changes: exploiting that there are no dependencies between the Y i , using a simple dyadic decomposition we here obtain For the proof of Corollary 8 it suffices to show that X D ≤ Z C holds in the present setting. Since Y i is a sum of n independent indicators ξ i,j , we may write each Yi r as a sum of n r dependent indicators (which each are products of some r distinct independent variables ξ i,j ). Using the constraint Y i ≤ ⌊D⌋ the analogous left hand side of (14) is thus bounded by r · ⌊D⌋ r−1 ≤ 2D r−1 , which in turn implies X D ≤ Z C , as desired. Since the proof of Lemma 9 also remains valid (as inequality (19) carries over), we thus arrive at the following result.
Perhaps surprisingly, we are not aware of any standard method or inequality (for sums of iid variables) which can routinely recover the upper tail bounds from Theorem 13. Here one technical difficulty seems to be that each summand Yi r has an upper tail that decays slower than exponentially (for r ≥ 2), which presumably is closely linked to the somewhat non-standard log(1/p) term in the exponent.

A Appendix: Lower bounds on the upper tail
In this appendix we establish fairly routine lower bounds on the upper tail P(X ≥ (1 + ε)µ) from Theorem 1, 2, and 5 (omitting some straightforward details). Following [31] we obtain our lower bounds via the following three events: that many copies of K 1,r 'cluster' on few edges (see Lemma 14 and 16), that most copies of K 1,r arise disjointly (see Lemma 15 and 17), and that G n,p contains more edges than expected (see Lemma 18).

A.1 Basic argument for Theorem 1
For Theorem 1 we shall use two different lower bounds, and the first one is based on the idea that relatively few edges (which 'cluster' in an appropriate way) can create fairly many stars K 1,r . This is formalized by the following result, which implies P(X r,n,p ≥ x) ≥ P(F ⊆ G n,p ) = p |E(F )| since F ⊆ G n,p enforces X r,n,p ≥ x.
Lemma 14 (Clustering). For every r ≥ 1 there is D = D(r) > 0 so that for all n ≥ 1 and 0 < x ≤ X r,n,1 there is F ⊆ K n with |E(F )| ≤ D max{x 1/r , x/n r−1 , 1} edges such that F contains at least x copies of K 1,r .
Inspired by the proofs of Theorem 1.3 and 1.5 in [17], the idea is to use a complete bipartite graph F = K y,z with y = Θ r (min{x 1/r , n}) and z = Θ r (x/y r ), which contains yz = Θ r (x/y r−1 ) = O r (max{x 1/r , x/n r−1 }) edges and at least z y r = Θ r (zy r ) = Ω r (x) copies of K 1,r (certain border cases require minor care). Proof of Lemma 14. Let x 0 := 2(4r) r , n 0 := (r+1)x 0 , and D := n 2 0 . If (i) x 0 ≤ x ≤ n r+1 /D and n ≥ n 0 , then we let F := K y,z , with y := ⌈min{x 1/r , n}/4⌉ and z := ⌈r r x/y r ⌉. Note that F ⊆ K n exists, since it is easy to check that 1 < y ≤ n/2 and 1 < z ≤ n/2, say (we leave the details to the reader). Furthermore, F contains at least z y r ≥ z(y/r) r ≥ x many K 1,r , and |E(F )| = yz ≤ 2r r x/y r−1 ≤ D max{x 1/r , x/n r−1 } edges. If either (ii) 1 ≤ n < n 0 or (iii) x > n r+1 /D and n ≥ n 0 , then we let F := K n , which trivially contains X r,n,1 ≥ x copies of K 1,r , and |E(F )| < n 2 < max{n 2 0 , Dx/n r−1 } = D max{1, x/n r−1 } edges. Finally, if (iv) x < x 0 and n ≥ n 0 , then we let F := K n0 , which contains at least n 0 /(r + 1) = x 0 > x vertex disjoint copies of K 1,r and |E(F )| < n 2 0 = D edges, completing the proof. The second lower bound is inspired by the fact that X = X n,r,p is approximately Poisson for small p, in which case most K 1,r arise disjointly. Indeed, the following standard result bounds P(X = m) from below by the probability that there are exactly m vertex-disjoint copies of K 1,r (see [7,26,31] for similar arguments), which for m = (1 + ε)µ will imply P(X ≥ m) ≥ e −Or,ε(m) ; the precise form of (42) will be useful later on.
Lemma 15 (Disjoint approximation). Given r ≥ 2 there are n 0 , b > 0 (depending only on r) such that, for all n ≥ n 0 , 0 < p ≤ n −1−1/(r+1) and integers m ∈ N satisfying 0 ≤ m ≤ 99 max{µ, n 1/(r+1) }, we have Proof. Let K contain all copies of K 1,r in K n . Define S m as the collection of all m-element subsets of K in which all stars K 1,r are vertex disjoint. Given C ⊆ S m , define I C as the event that all stars K 1,r of C are present, and define D C as the event that none of the stars K 1,r in K \ C are present. Note that Distinguishing the number of edges in which each star α ∈ K \ C overlaps with some star K 1,r from the vertex-disjoint collection C ∈ S m , using Harris inequality [12] and np = o(1) we routinely obtain where mnp = O(max{n r+2 p r+1 , n 1+1/(r+1) p}) = O(1). Furthermore, with (z − y) y /y! ≤ z y ≤ z y /y!, 1 − x ≥ e −2x and X n,r,1 = n n−1 r in mind, basic counting (and a short calculation) gives This completes the proof of (42) since m 2 = O(max{n 2(r+1) p 2r , n 2/(r+2) }) = O(n 2/(r+2) ) = o(n).
Combining the above two results, we now prove the lower bound of Theorem 1.

A.2 Refined arguments for Theorem 2 and 5
For Theorem 2 and 5 we shall refine the previous two lower bounds, and also introduce a new third lower bound. Each time some care is needed to obtain the 'correct' dependence on t = εµ in the exponent, and we start by refining the 'clustering' based lower bound from Lemma 14 and (43).
In case of p = o(1) the basic proof idea is to obtain µ+t copies of K 1,r as follows: (i) we first use the clustering construction from Lemma 14 to plant 2t copies of K 1,r , and (ii) then use Harris' inequality and a one-sided Chebychev's inequality to show that typically ≥ µ − t of the remainingX n,r,1 := X n,r,1 − 2t other copies of K 1,r are present (the crux is that the expected number of such copies isX n,r,1 p r = µ−o(t), so having ≥ µ−t of them intuitively seems likely). For the resulting lower bound step (i) with probability p Or (max{t 1/r ,t/n r−1 }) thus ought to give the main contribution, making (45) plausible. For technical reasons, in the actual argument we have to plant min{(β + 1)t, ⌈µ + t⌉} copies of K 1,r for carefully chosen β > 0. By mimicking the proof of Theorem 21 in [31] we then easily arrive at (45) above; we leave the details to the reader. We next refine the 'disjoint approximation' based lower bound used in Lemma 15 and (44) for small p. The idea is that inequality (42) intuitively relates X = X n,r,p to a binomial random variable with mean µ = X n,r,1 · p r , which makes the following Chernoff-type bound for the upper tail plausible.
Noting the binomial-like form of inequality (42) it is routine to check that Lemma 15 indeed implies (46) above (e.g., by summing (42) as in the proof of Theorem 22 in [31]); we leave the details to the reader. Our third lower bound for moderately large p it is based on the idea that a deviation in the number of edges should typically entail a deviation in the number of K 1,r copies (in concrete words: if G n,p has substantially more than n 2 p edges, then we expect to have more K 1,r copies than on average). Lemma 18 (Deviation in number of edges: sub-Gaussian type lower bound). Given r ≥ 2 and ξ ∈ (0, 1) there are n 0 , β, c > 0 (depending only on r, ξ) such that, setting Λ := µ(1 + (np) r−1 ), for all n ≥ n 0 , ξn −1 ≤ p ≤ 1 − ξ and σ ≤ t ≤ βµ we have P(X ≥ µ + t) ≥ exp −cϕ(t/µ)µ 2 /Λ ≥ exp −ct 2 /Λ .