Modular statistics for subgraph counts in sparse random graphs

Answering a question of Kolaitis and Kopparty, we show that, for given integer $q>1$ and pairwise nonisomorphic connected graphs $G_1...G_k$, if $p=p(n) $ is such that $\Pr(G_{n,p}\supseteq G_i)\to 1$ $\forall i$, then, with $\xi_i$ the number of copies of $G_i$ in $G_{n,p}$, $(\xi_1...\xi_k)$ is asymptotically uniformly distributed on ${\bf Z}_q^k$.


Introduction
For graphs G, H write N (G, H) for the number of unlabeled copies of H in G (e.g. N (K r , K s ) = r s ). We use both G n,p and G(n, p) for the ordinary ("binomial" or "Erdős-Rényi") random graph.
We are interested here in extending to nonconstant p the following beautiful result of Kolaitis and Kopparty [4]. Theorem 1. Fix an integer q > 1, p ∈ (0, 1) and pairwise nonisomorphic connected graphs G 1 , . . . , G k , each with at least two vertices, and let ξ i be N (G n,p , G i ) (mod q). Then the distribution of ξ = (ξ 1 , . . . , ξ k ) is e −Ω(n) -close to uniform on Z k q . In particular, for each a ∈ Z k q , Pr(ξ = a) → q −k as n → ∞.
(Recall two distributions are ε-close if their statistical (a.k.a. variation) distance is at most ε.) Theorem 1 was motivated by an application to 0-1 laws for first order logic with a parity quantifier or, more generally, a quantifier that allows counting modulo q; see Section 3 for a little more on this.
A natural question raised in [4] (and communicated to the authors by S.K.) asks, to what extent does Theorem 1 remain true if p is allowed to tend to zero as n grows, e.g. if p = n −α for some fixed α > 0? Our purpose here is to answer this question. We need a little notation. For a graph Recall (see e.g. [2]) that n −1/m(H) is a threshold function for containment of H; that is, the probability that G n,p (p = p(n)) contains a copy of H tends to 0 if pn 1/m(H) → 0 and to 1 if pn 1/m(H) → ∞. Given a collection G of graphs, set m(G) = max{m(G) : G ∈ G}, p G (n) = n −1/m(G) and Theorem 2. Let q, G 1 , . . . , G k and ξ = (ξ 1 , . . . , ξ k ) be as in Theorem 1 and G = {G 1 , . . . , G k }. If p = ω(p G (n)), then the distribution of ξ is exp[−Ω(Φ G (n, p))]-close to uniform on Z k q .
(Of course the constant in the exponent depends on q and G.) For the special case G = {K 3 }, a somewhat weaker version of Theorem 2-with exp[−Ω(Φ G (n, p))] replaced by something polynomial in n and p-has been shown by Noga Alon [3].
We should also note here an immediate consequence of Theorem 2, which again answers a question from [4].
This is of interest partly for its possible relevance to proving a modular convergence law (again see Section 3) for p = n −α with α irrational (cf. [5,Theorem 6], which says that for such p a 0-1 law holds for any first order property); but we also have, again from [4]: "Even the behavior of subgraph frequencies mod 2 in this setting [i.e. with p as in Corollary 3] seems quite intriguing." The proof of Theorem 2, given in the next section, is similar to that of Theorem 1 in [4]. In truth, we just add one little idea to the machinery of [4]; nonetheless, as the proof answers a rather basic question, and was apparently not quite trivial to find, it seems worth recording.

Proof
We will need the following two facts, the first of which, from [4], generalizes a result of Babai, Nisan and Szegedy [1].
• a E j = 0 for all j, q be the random variable where, independently for each i, Pr(z i = 1) = p and Pr(z i = 0) = 1 − p. Then for ω ∈ C a primitive q th -root of unity, (We again observe that the implied constant depends on q, p and d.) Lemma 5 ("Vazirani XOR Lemma"). Let q > 1 be an integer and ω ∈ C a primitive q th -root of unity. Let ξ = (ξ 1 , . . . , ξ l ) be a random variable taking values in Z l q . Suppose that for every nonzero Then the distribution of ξ is (q l ǫ)-close to uniform on Z l q .
Proof of Theorem 2. Letting e run over edges of K n , the argument of [4] expresses each c i ξ i in the natural way as a polynomial in the indicators z e := 1 {e∈G(n,p)} (e ∈ E(K n ))-namely, -and for the E of Lemma 4 uses Ω(n) vertex-disjoint copies of some largest G i among those with c i = 0. The problem with this in the present situation is the (hidden) dependence of the bound in (1) on p.
We get around this difficulty by choosing our random graph in two steps, so that when we come to apply Lemma 4 we are back to constant p. For simplicity we now write Φ for Φ G (n, p), G ′ for G(n, 2p) and G for the random subgraph of G ′ in which each edge is present, independently of other choices, with probability 1/2; in particular, our ξ i 's are functions of G (= G(n, p)).
Given G ′ , we will apply Lemma 4 with variables z e = 1 {e∈G} (e ∈ G ′ ), F the collection of copies of G 1 , . . . , G k in G ′ , and E ⊆ F a large collection of vertex-disjoint copies of an appropriate G i ; so first of all we need existence of such an E. For a given ε, let D = D ε be the event that G ′ contains, for each i, a collection of r := εΦ vertex-disjoint copies of G i . Proposition 6. There is a fixed ε > 0 (depending on G) for which (2) Proof. Though we don't know a reference, this is presumably not new and the ideas needed to prove it may all be found in [2]; so we just briefly indicate what's involved. Fix i ∈ [k] and write H for G i . Let Y be the maximum size of a collection of disjoint copies of H in G ′ . It is enough to show that the median of Y is Ω(Φ); (2) then follows via an inequality of Talagrand ([7] or [2, Theorem 2.29]) as in the argument for the edge-disjoint analogue of Proposition 6 given on page 77 of [2].
In view of Proposition 6 it is enough to show that for any G ′ satisfying D, the conditional distribution of ξ given {G ′ = G ′ } is exp[−Ω(Φ)]-close to uniform on Z k q . Given such a G ′ and 0 = c ∈ Z k q , take F i to consist of all copies of G i in G ′ (i ∈ [k]) and F = ∪{F i : c i = 0}.
Fix, in addition, some i 0 ∈ [k] with c i 0 = 0 and |G i 0 | = max{|G i | : c i = 0} =: d, and some where z e = 1 {e∈G} for e ∈ G ′ . We then need to say that Q, F and E (with q, d and p = 1/2) satisfy the requirements of Lemma 4. But the first three of these are immediate and the fourth follows from the connectivity of the G i 's: and then (since this was for any c = 0) Lemma 5 says that, as desired, the conditional distribution of ξ given {G ′ = G ′ } is exp[−Ω(Φ)]-close to uniform on Z k q .

Discussion
As mentioned earlier, Theorem 1 is a key ingredient in the proof of the Kolaitis-Kopparty "modular convergence law" for first order logic with a parity quantifier, or, more generally, a quantifier that allows counting mod q. This law says, briefly, that, for fixed p and n → ∞, the probability of a given sentence in the system under consideration tends to a limit that depends only on the congruence class of n mod q. (See also [6] for an in-depth discussion of 0-1 laws for random graphs.) As suggested in [4], it would be interesting to understand to what extent such a law holds in the sparse setting. Theorem 2 gets about half way to this goal (for p in its range); but the other half-an assertion like Theorem 2.3 of [4] to the effect that all relevant information is contained in the subgraph frequencies-seems to require something new, since the quantifier elimination process underlying that step depends critically on properties of G(n, p) that hold for constant p but fail when p tends to zero.
In closing we just mention that it would be interesting to find a proof of Theorem 2 that proceeds from first principles and does not depend on the "generalized inner product" polynomials underlying Lemma 4.