Thresholds and expectation-thresholds of monotone properties with small minterms

Let $N$ be a finite set, let $p \in (0,1)$, and let $N_p$ denote a random binomial subset of $N$ where every element of $N$ is taken to belong to the subset independently with probability $p$ . This defines a product measure $\mu_p$ on the power set of $N$, where for $\mathcal{A} \subseteq 2^N$ $\mu_p(\mathcal{A}) := Pr[N_p \in \mathcal{A}]$. In this paper we study upward-closed families $\mathcal{A}$ for which all minimal sets in $\mathcal{A}$ have size at most $k$, for some positive integer $k$. We prove that for such a family $\mu_p(\mathcal{A}) / p^k $ is a decreasing function, which implies a uniform bound on the coarseness of the thresholds of such families. We also prove a structure theorem which enables to identify in $\mathcal{A}$ either a substantial subfamily $\mathcal{A}_0$ for which the first moment method gives a good approximation of its measure, or a subfamily which can be well approximated by a family with all minimal sets of size strictly smaller than $k$. Finally, we relate the (fractional) expectation threshold and the probability threshold of such a family, using duality of linear programming. This is related to the threshold conjecture of Kahn and Kalai.


Introduction
One of the fundamental phenomena in random graph theory is that of thresholds of monotone properties. This dates back to the seminal papers of Erdős and Renyi [5,6] who defined the notion of thresholds and discovered that for many interesting graph properties the probability of the property appearing in the random binomial graph G(n, p), for large n, behaves much like a step function of the edge probability p, increasing from 0 to 1 abruptly as p is varied slightly. The study of thresholds of random structures in general, and in random graphs specifically, has been a thriving area ever since, and thousands of papers have covered related problems. Bollobàs and Thomason [4] showed that every monotone property of sets has a threshold function and, using the Kruskal-Katona theorem, gave optimal quantification of such thresholds. In [8] it was observed that the KKL theorem [11], and its extension in [3] imply sharp thresholds for properties which are symmetric under the action of a group on the elements of the ground set, in particular for graph properties.
For most interesting families of graph properties the threshold function p(n) tends to zero as n tends to infinity. In this case it is of interest to study the sharpness of the threshold. Fixing a graph property A, and a parameter , one may ask what is the width of the interval of values of p in which the probability of G(n, p) having property A climbs from to 1 − . The scale in which this width is measured is with respect to the value of p for which the probability of G(n, p) ∈ A is, say, 1/2. For a series of properties A n , of graphs on n vertices, we will say that the threshold is sharp if the ratio between the width of the threshold interval and the critical p tends to 0. We will shortly give a more precise definition of sharp thresholds in a more general setting.
In his Ph.D. thesis, the first author [7] gave a necessary condition for monotone graph properties to have a sharp threshold. Roughly speaking, if a property does not have a sharp threshold it must be well approximable by a local property (e.g. containing a triangle), as opposed to properties that are global (e.g. connectivity), and cannot be well approximated by the property of containing a subgraph from a fixed given list. In the appendix to [7] Bourgain proved a similar statement, with a slightly weaker conclusion, in a much more general setting, without the assumption of symmetry. In a recent paper Hatami, [9] gives a common generalization of these two results.
Returning to the question of thresholds of local properties, the appearance of any fixed subgraph in G(n, p) has a coarse threshold, and this is well understood, see Bollobás' paper [1] for a complete description. Roughly speaking, if a fixed graph H is strictly balanced then the number of copies of H in G(n, p) will be approximately Poisson, and the governing parameter, the expectation of the random variable, will be of order p |E(H)| n |V (H)| , which varies smoothly with p: when p is multiplied by a constant c, the expectation changes by a factor of c |E(H)| . When H is not balanced the situation is only slightly more complicated, and appearance of copies of H in G(n, p) can be understood by studying the appearances of the densest subgraphs of H. Intuitively, if H has a subgraph H which is much denser than H, then every copy of H that appears in G(n, p) is extremely likely to be nested in many copies of H. Kahn and Kalai, [10] have a far reaching conjecture as to the generalization of this to families of graphs with size which is not fixed (such as a Hamiltonian cycles). In a nutshell, they conjecture that for such families there is at most a logarithmic gap between the threshold probability for the appearance of a graph from the family, and the probability at which the expectation is constant (once again, taking into account the densest subgraphs).
The basic question which led to the writing of this paper was: how specific is this behavior to graphs? The proofs of this behavior use the symmetry of graphs very strongly, yet it seemed possible that something similar should hold also for properties of random binomial subsets of a ground set without any symmetry assumptions. This would imply a converse to the main theorems of [7] and its appendix: not only does a non-sharp threshold imply that the property in question has local nature, but also any property determined by small minimal sets has a non-sharp threshold.
We will prove in this paper that this indeed is the case.

Setting and main results
Let [n] denote the set {0, 1, . . . , n}, and let [n] p denote a random subset of [n], where each element is chosen independently with probability p. A family of sets A is called monotone if whenever A ∈ A and A ⊂ B, then B ∈ A. For such a family, a set which is minimal with respect to inclusion is called a minterm. For A, a family of subsets of [n] and p ∈ [0, 1] we define µ(A, p) to be the probability that [n] p ∈ A. Note that if A is monotone then this function is monotone in p. We will also use the notation µ p to denote the measure µ(·, p).
For a fixed non-trivial monotone family A and any x ∈ [0, 1] we define p x to be the unique number such that µ(A, p x ) = x. For 0 < < 1 define δ (A) = . The numerator, is the length of the threshold interval in which the probability of A climbs from to 1/2. The denominator, p 1/2 , supplies the correct yardstick with which to measure this length. The slower δ (A) tends to 1 as tends to 0, the sharper the threshold is (in other words, the threshold interval is small), the faster it tends to 1 the coarser the threshold is. (Note that it also would make sense to study the interval [p , p 1− ], however our choice gives a neater normalization, bounding δ (A) between 0 and 1). The simple derivation of this theorem from the Margulis-Russo lemma were pointed out to us by Oliver Riordan. Note that the theorem is tight, e.g., for a family with a single minterm. We present the proof of Theorem 2.1 in Section 3 below. An upper bound on δ will follow from a different approach which we present in Section 5. Namely, To present the results of Section 5 we first need a definition.
where the minimum is taken over all functions β : If β is a function for which the minimum in the definition of the fractional expectation is achieved, then E * (A) is the expectation of the function g(A) = B β(B)1 B⊆A , which is non negative and assumes values greater or equal to 1 on all A in A, hence this too gives a (better) upper bound on µ(A), namely The main result of Section 5 is that for monotone families with minterms of bounded size this bound is not too far off mark.
Theorem 2.4. Let A be a monotone family with all minterms of size at most k. Then for any α > 0 As a corollary we deduce that a special case of the expectation-threshold conjecture of [10].
In Section 4 we approach the question of understanding the threshold behavior of a monotone family A ⊆ P ([n]) via the parameter E (A), the expected number of minterms of A in a random set [n] p . Whenever we have good control over the second moment of this random variable, the expectation gives us a good indication as to the probability µ p (A). An example of this setting is, say, the family of all subgraphs of K m that contain a copy of K 4 , when p = Θ(m −2/3 ). It is easy to verify that in this case the expected number of minterms (i.e. the expected number of copies of K 4 in G(m, p)) is Θ(1), whereas the variance is also of this order of magnitude, which enables us to get an effective lower bound on the measure of the family using the Payley-Zygmond bound which holds for any non-negative random variable Z. On the other hand, consider the example where the minterms are all subgraphs of K m containing a copy of "K 4 with a tail", a graph consisting of K 4 with a fifth vertex connected to precisely one of the four. We again set p = m −2/3 and a moment of thought shows that although this family is properly contained in the previous one, the measure of their symmetric difference is negligible, as any copy of K 4 that appears in G(m, m −2/3 ) is overwhelmingly likely to have many tails. This is reflected by the fact that the expectation now is huge rather than constant. In this case one has to realize that the tail connected to K 4 is a red herring, and proper analysis can, and should, focus on the previous family.
Such examples are almost canonical in any introductory course to random graphs. Our main theorem in Section 4 guarantees that something similar to one of these two case should hold in any family A defined by minterms of bounded size k. Either there is a substantial subfamily B for which the first and second moments are well behaved, or a substantial subfamily B that may be approximated by a different family with minterms of size strictly smaller than k. This structure theorem then allows us to deduce a theorem quite similar to Theorem 2.1, with a slightly worse rate of decay (e.g. µ p/2 (A) ≥ µ p (A)/k8 k , as opposed to the truth which is 3 The Margulis-Russo lemma, and proof of Theorem 2.1 For a monotone family A, the Margulis-Russo lemma, ( [13], [14]) relates the derivative of µ(A, p) with respect to p with the edge boundary of A. If A ∈ A, but (A \ a) is not in A we say that a is a pivotal element of A and that there is a boundary edge "leaving A in the direction of a". Let P iv(A) denote the number of pivotal elements in A (which is necessarily 0 if A ∈ A). Let A be a random set chosen according to µ(p), then P iv(A) is a random variable, and its expectation is a measure of the size of the boundary of A.
As the lemma below shows, this is a parameter intimately correlated with the threshold behavior of A. This lemma, which is so simple to state, and, as we shall see shortly, very easy to prove, is extremely useful. See, e.g., [8], [7], [2] . It is not surprising that this lemma is relevant when studying thresholds, as the expression on the right hand side is clearly related to the ratio between the width of the threshold interval and the value of p within the interval (it is an approximation of its reciprocal). Both Russo and Margulis proved this lemma by induction on n, the size of the ground set from which A is chosen. For the sake of being self contained we present a different proof, which is well known folklore, perhaps due to Gil Kalai.
Proof of Lemma 3.1: Let A be a monotone family of subsets of [n], and for some fixed (p 1 , p 2 , . . . , p n ) consider the following product measure µ (p 1 ,p 2 ,...,pn) on P ([n]). The measure of a set A is i∈A p i j ∈A (1 − p j ). For i ∈ [n] and a random set A chosen according to µ (p 1 ,p 2 ,...,pn) let a i denote the probability that (A ∈ A and (A\{i}) ∈ A), and let b i denote the probability that i is pivotal in A∪{i}, i.e. ((A∪{i}) ∈ A and (A\{i}) ∈ A). This means that the probability that i is pivotal is p i b i . Recalling that A is monotone, for all i we have µ (p 1 ,p 2 ,...,pn) We now let all of the p i depend on a common parameter p in the following trivial way: p i (p) = p, and note that the resulting measure is µ p . The Margulis-Russo formula now follows from a simple application of the chain rule.
The fact that Theorem 2.1 follows immediately from the Margulis-Russo formula is yet another example of how useful this result is.
Proof of Theorem 2.1: Let A be a monotone family with all minterms of size at most k. Let A be a random set chosen according to µ p . Note that A can never have more than k pivotal elements, and that if A ∈ A then, by definition there are no pivotal elements. Therefor Using this in conjunction with the Margulis-Russo formula, and deriving with respect to p gives

A structure theorem for monotone families with small minterms.
We begin with some notation. Let A be a monotone family and let M(A) be the set of its minterms. Throughout this section we will assume that all minterms are of size at most k. Let X A be the random variable that counts the number of minterms of A in [n] p . For a set V ⊂ [n] we will define the family of its m−supplements with respect to A to be the family of sets W of size m whose disjoint union with V form a minterm, namely We say that A is tame with respect to p if for any 1 ≤ m ≤ k − 1 and for any V ⊂ [n] one has |N m B (V )| < p −m . We say that B is a tame m-approximation of A at p, if there is a subfamily A ⊂ A, such that for all minterms B ∈ M(B) it holds that the set N m A (B) has size at least p −m and is tame.
The definition of a tame family is useful because it implies that for such a family the first moment bound on the measure is not too far from the truth. This is captured by the following lemma.
Proof. We would like to use the Paley-Zygmund inequality to bound the probability from below: Let us calculate the numerator and denominator separately. Denoting M(m) = |{M i ∈ M s.t. |M i | = m}| it's easy to see that: The denominator needs a bit more careful work. Remembering that M(A) = {M i } and that X A is the random variable counting the number of minterms, we can define X i to be the indicator of M i and write X A = i X i . With this we have: it's easy to see that: and as the X i s are indicators we get and so we are left with taking care of the second summand.
Note that A is a tame family, thus for any M i and 1 ≤ m ≤ k − 1 one has: With this we are ready to do the calculation. We break the sum into sums corresponding to the different sizes of minterms and supplements.
Summing everything together we get that: And now plugging this in to Paley-Zygmund we get: For the simplicity of further calculations we can obviously write: k2 k as needed.  (1) is weaker than what can be deduced using Theorem 2.1, nonetheless we include it with its proof in order to demonstrate the information one can deduce from the structural approach.
Proof. To see (1) we only need to note that as the minterms of A are of size at most k one has , together with the inequality on the first moment Ep [X A ] ≥ µ p (A) and Lemma 4.1 we get as needed.
For (2) note that as any B ∈ M(B) is a subset of a minterm of A the following inequality holds for any p: It is left to show that for any B ∈ B one has µ p (N m is tame we can apply Lemma 4.1 and get: Now we are ready to present the structural result and its corollaries. 1. There exists a subfamily B ⊆ A, with µ p (B) ≥ µ p (A)/2, which is tame with respect to p/2.

2.
There exists m between 1 and k − 1, and a family B which is a tame m-approximation of A at p/2 with µ p (B) ≥ µ p (A)/2 m+1 .
By induction on the size of the minterms and application of the Theorem 4.3 and corollary 4.2 we deduce the following corollary, whose proof we defer until after the proof of the theorem.
By repeated application of the corollary above one gets: As A = A 1 ⊇ A 2 · · · ⊇ A k−1 ⊇ A k there are two possible options. Either there is some m for which µ p (A m \ A m+1 ) is large, or if for all m we have that µ p (A m \ A m+1 ) is small then µ p (A k ) is large.
Note that A k is tame with respect to p/2 as we removed all subsets V ⊂ [n] that have many supplements (of any size.) If µ p (A k ) ≥ 1 2 µ p (A) this gives us immediately the first case of the theorem. Proof of Corollary 4.4. We will use induction on the size of the minterms. For k = 1 we note that A is tame by definition, so we can directly apply Lemma 4.1 and together with the first moment we get the required inequality: Now assume we have proved for any < k and let us proof for k. Theorem 4.3 gives us two options. If we have the first one, then there is a tame family B which is a subfamily of A and µ p (B) ≥ 1 2 µ p (A). Then together with Corollary 4.2 we have: which is stronger then the required inequality.
If we are in the second case of the theorem note that any B ∈ M(B) has supplements of size m and so the size of each minterm in B is at most k − m. Thus we can use the induction assumption on B and get that µ p/2 (B) ≥ 1 (k−m)2 3(k−m)−1 µ p (B).
Recalling that from Theorem 4.3 µ p B ≥ 1 2 m+1 µ p (A), it is left to apply Corollary 4.2 and get: a simple calculation will give us the fact that . ( To get good control on µ(A) it makes sense to try and find such a function f which is as small as possible, and a function g for which the second moment is well behaved (say, not too much weight on the upset generated by any single set, a quantity that arises naturally when calculating the second moment.) The trick will be to relate these two functions via LP duality. First, for q ∈ [0, 1], define where the minimum is taken over all functions β : where the maximum is taken over all functions ν : Now, for any p, q ∈ (0, 1) we let α = q/p and proceed to relate µ p and E * q . Let ν be a function achieving the maximum in the definition of L * q (A) and define g(X) = where (6) follows from (4), and (7) follows from the definition of ν and the fact that all minterms are of size at most k. We now use (7) and (5) in (2), together with the fact that L * = E * : A nice feature of Theorem 2.4 is that it gives sufficient control over the rate of change of µ p (as a function of p) to give both lower and upper bounds. This is embodied in the following corollary which also implies theorem2.2.
Note that setting b := , a := 1/2 implies theorem2.2. Also, note that Theorem 2.1 yields the bound a b 1/k ≤ pa p b , so that when a/b is large this is almost as good. Proof: For the lower bound observe that Theorem 2.4 implies for any p and α µ αp (1 + α) k ≤ µ p .
Setting p := p b and α := p a /p b gives the required result.
For the upper bound it is useful to use the inverse function to E * . Let q x be the value q for which E * q (A) = x. Theorem 2.4 implies for any x and α p x ≤ q x(1+α) k α .
Also, it is easy to see that for y ≤ x it holds that q x ≤ q y x y . Furthermore, for every x ≤ 1 we have q x ≤ p x . Putting these together gives The function (1+α) k α is minimized at α = 1/(k − 1). Plugging this value of α into the above expression yields the result.