The Normalized Matching Property in Random and Pseudorandom Bipartite Graphs

A simple generalization of the Hall's condition in bipartite graphs, the Normalized Matching Property (NMP) in a graph $G(X,Y,E)$ with vertex partition $(X,Y)$ states that for any subset $S\subseteq X$, we have $\frac{|N(S)|}{|Y|}\ge\frac{|S|}{|X|}$. In this paper, we show the following results about having the Normalized Matching Property in random and pseudorandom graphs. 1. We establish $p=\frac{\log n}{k}$ as a sharp threshold for having NMP in $\mathbb{G}(k,n,p)$, which is the graph with $|X|=k,|Y|=n$ (assuming $k\le n\leq \exp(o(k))$), and in which each pair $(x,y)\in X\times Y$ is an edge independently with probability $p$. This generalizes a classic result of Erd\H{o}s-R\'enyi on the $\frac{\log n}{n}$ threshold for having a perfect matching in $\mathbb{G}(n,n,p)$. 2. We also show that a pseudorandom bipartite graph - upon deletion of a vanishingly small fraction of vertices - admits NMP, provided it is not too sparse. More precisely, a bipartite graph $G(X,Y)$, with $k=|X|\le |Y|=n$, is said to be Thomason pseudorandom (following A. Thomason (Discrete Math., 1989)) with parameters $(p,\varepsilon)$ if each $x\in X$ has degree at least $pn$ and each pair of distinct $x, x'\in X$ has at most $(1+\varepsilon)p^2n$ common neighbors. We show that for any large enough $(p,\varepsilon)$-Thomason pseudorandom graph $G(X,Y)$, there are"tiny"subsets $\mathrm{Del}_X\subset X, \ \mathrm{Del}_Y\subset Y$ such that the subgraph $G(X\setminus \mathrm{Del}_X,Y\setminus \mathrm{Del}_Y)$ has NMP, provided $p \gg\tfrac{1}{k}$. En route, we prove an"almost"vertex decomposition theorem: Every such Thomason pseudorandom graph admits - excluding a negligible portion of its vertex set - a partition of its vertex set into graphs that we call Euclidean trees. These are trees that have NMP, and which arise organically through the Euclidean GCD algorithm.


Introduction
Consider the following problems: 1. Suppose k ≤ n are positive integers. By a k × n star-array (or simply star-array), we mean a k × n array whose entries are symbols from the set {0, }. Given a k × n star-array, when is it possible to replace some of the entries of the array by non-negative integers such that in the resulting array all the row sums equal R, and all the column sums equal C for some integers R, C > 0?
2. Let q be a sufficiently large prime power and suppose X, Y ⊂ F q with |Y | = 10|X|, |X| ≥ q/100. Is it possible to label each element of Y with some element of X such that each element of X appears as a label exactly 10 times, and further, for each y ∈ Y labeled x, the sum x + y is a quadratic residue? More generally, one can ask the same question with a subgroup H ⊂ F * q instead of the set of quadratic residues.
In both the problems posed above, there is a natural bipartite graph G(X, Y, E) that captures the problem in its essence: Given a star-array A, let X and Y denote the set of rows and columns of A respectively, and a vertex x ∈ X is adjacent to y ∈ Y in G if and only if the (x, y) entry of A corresponding to a . For the second problem consider the bipartite graph G(X, Y, E) where X, Y are the given sets, and the pair (x, y) is an edge in G if and only if x + y ∈ H.
In the rest of the paper, G(X, Y ) shall denote a bipartite graph with vertex partition (X, Y ); we shall drop the E in our notation for convenience. We say that G = G(X, Y ) has the Normalized Matching Property (NMP for short) if: For any S ⊆ X, if we denote by N (S), its set of neighbors in Y , then |N (S)| |Y | ≥ |S| |X| . In particular, if |X| = |Y |, then this is the familiar Hall's condition for the existence of a perfect matching in G.
The following theorem of Kleitman [16] gives us an equivalent formulation of NMP in bipartite graphs: Theorem 1.1. The following statements are equivalent: • G with |X| = k, |Y | = n has NMP.
• For any independent set I in G, |I X | k + |I Y | n ≤ 1.
• There exists a multiplicity function m : E → N 0 = N∪{0} such that  is equal for all x ∈ X (resp. for all y ∈ Y ).
It is easy to see that the problems posed above simply ask if the associated bipartite graphs have NMP by virtue of the third part of Theorem 1.1.
The Normalized Matching Property in bipartite graphs was introduced by Graham and Harper [11] and subsequently has been a focus of study in bipartite graphs in several papers (for instance [16,24]) and some monographs as well (for instance [4,7]). The notion also extends very naturally to finite ranked posets; for a ranked poset P , let L i denote the set of all elements of P with rank i. Then we say that P has NMP if for each i, the bipartite graph of poset covering relations between L i and L i+1 has NMP. NMP posets are objects of great interest specifically in related decomposition problems (see [12,13,14,22,23] for some decompositions results). As a concrete instance, the Griggs conjecture which states that any unimodal NMP poset admits a nested chain decomposition (see [14] or [25] for more details on what the definitions are) is still open -even for posets of rank 3 -despite several attacks on the problem.
As it turns out, many interesting finite ranked posets arising from finite geometric structures have NMP. Indeed, the Boolean poset, the poset of affine flats in a finite projective n-dimensional space and the poset of the subgroup lattice of abelian p-groups all have NMP (see [21,22,23] respectively), i.e., in each of these posets, the associated bipartite graphs on the sets of elements of successive ranks within these posets have NMP. As is the case with Hall's theorem for bipartite graphs, it is clear that graphs with "high density" are more likely to possess NMP. But in each of the instances listed above, the associated bipartite graphs are very sparse. This raises the following natural question: At what density does a typical bipartite graph have NMP ?
To formulate the above question more precisely, we set up some asymptotic terminology and notation. Given functions f, g, we write f g (resp. f g) if lim n→∞ f (n) g(n) → ∞ (resp. f (n) g(n) → 0). We also write f = o(g) to denote that f g. We write f = O(g) (resp. f = Ω(g)) if there exists an absolute constant C > 0 and n 0 such that for all n ≥ n 0 , |f (n)| ≤ C|g(n)| (resp. if |f (n)| ≥ C|g(n)|). If the constant C involves a related parameter ε, then we write f = O ε (g) (resp. f = Ω ε (g)) to indicate the dependence of the implicit constant on the parameter ε.
To formalize the question posed above, we recall some standard terminology from the theory of random graphs. For a probability space (Ω, P) we say that an event E n that depends on a parameter n occurs with high probability (abbreviated as whp) if P(E n ) → 1 as n → ∞. A graph property P is simply a collection of graphs, and a graph property is called monotone if whenever G ∈ P and G ⊂ H then H ∈ P as well. The Erdős-Rényi random graph model G(n, p) introduced in [9] is the random graph where the vertex set is the set [n] := {1, . . . , n} and each pair {i, j} is an edge with probability p = p(n) independently. A monotone graph property P is said to have a threshold p 0 = p 0 (n) if whenever p p 0 then G(n, p) has property P whp, and if p p 0 then whp G(n, p) does not have property P. A property P is said to have a sharp threshold p 0 (n) if for ε > 0 and p ≥ (1 + ε)p 0 , G(n, p) has property P whp and for p ≤ (1 − ε)p 0 , G(n, p) does not have property P whp.
The seminal paper of Erdős and Rényi [9] established sharp thresholds for several very natural monotone graph properties. A theorem of Bollobas and Thomason [6] showed that every monotone graph property admits a threshold. However, not all graph properties admit sharp thresholds; for instance, the property "G(n, p) contains a cycle" admits a threshold which is sharp on one side but not the other (see [15] for more on sharp thresholds). In fact, the problem of determining sharp thresholds (if the graph property admits one) is a very popular motif in the theory of random graphs.
For bipartite graphs, Erdős and Rényi also introduced the random bipartite model G(n, n, p) where the vertex set is partitioned into two sets X, Y of size n each, and each pair {x, y} with x ∈ X, y ∈ Y is in G(n, n, p) independently with probability p. One of the first results in this model is the result that log n n is a sharp threshold for the existence of a perfect matching in G(n, n, p) [10]. As observed earlier, if k = n, NMP is the same as Hall's condition for bipartite graphs, so it is natural to seek the threshold for NMP in a slightly more general model for bipartite random graphs, which is what the question previously posed seeks to do.
Suppose k ≤ n are positive integers, and let 0 ≤ p ≤ 1. Let G(k, n, p) denote the random bipartite graph with the vertex partition given by (X, Y ) with |X| = k, |Y | = n, and each pair (x, y) ∈ X × Y is an edge in G independently with probability p. Here both k and n should be thought of as parameters growing to infinity with n being a function of k that always satisfies n ≥ k. Our first main result in this paper establishes a sharp threshold for NMP in the sense stated above: In other words, G(k, n, p) has a sharp threshold for NMP at p = log n k .
Note that if n > exp(k) or equivalently, if log n > k, then the expression for our threshold exceeds one. Also, for each fixed p < 1, if C > 1 + log( 1 1−p ) and n ≥ exp(Ck), then a simple computation shows that the probability that Y has at least one isolated vertex is bounded away from zero (this will be clear from the proof of Theorem 1.2; see Lemma 3.1). Hence, the range for n in the statement of the theorem is essentially the widest possible one if one seeks a sharp threshold.
Let us now return to the problems at the beginning of this section. To check if a given bipartite graph has NMP is computationally simple: form a bigger new bipartite graph G (X , Y ) with |X | = |Y | = nk with X consisting of by n copies of X, Y consisting of k copies of Y , and x y being an edge in G if and only if xy was an edge in G. Then it is straightforward to see that G has NMP if and only if G admits a perfect matching. Hence either problem admits a computationally simple solution. But let us relax our requirement and seek an answer only in an approximate sense: For the first problem, is it possible to replace each entry with a non-negative integer such that with the exception of a negligible proportion of the rows/columns, the remaining rows and columns satisfy the aforementioned property? Or in the second problem, can we ignore a negligible proportion of elements from both sets X, Y , so that the desired property holds for the remaining elements? Since either of the originally posed problems is equivalent to asking if a given bipartite graph has NMP, this approximate version asks if a given bipartite graph "almost" has NMP in a certain sense that we shall formalize below.
The bipartite graph considered in the second problem (with the subsets of F q ) possesses certain regularity properties that are best described as "random-like" -as we shall soon see. Taking a cue from this, we impose the following reasonable hypotheses on bipartite graphs that we shall consider: If all the vertices of X have "almost" the same degree, and suppose that no two vertices of X have "too many" common neighbors in Y (so that there isn't a clustering of edges between some subsets of X and Y ), is there an affirmative answer to the approximate version for these problems?
To formulate this in more precise terms, we need the notion of a pseudorandom bipartite graph. The notion of pseudorandomness was first introduced by Thomason in the 80s [20] and pseudorandomness in graphs is a well-studied notion (see [18] for a definitive survey). One of the more popular and well-understood models for pseudorandomness in graphs is the notion of an (n, d, λ) graph (see [2]). An (n, d, λ) graph is a graph on n vertices which is d-regular and which satisfies the following property: If d = λ 1 ≥ λ 2 ≥ · · · ≥ λ n are the eigenvalues of G then |λ i | ≤ λ for all i > 1.
Pseudorandom graphs, as the name suggests, have some properties very reminiscent of random graphs, and the most well-known is the Expander-Mixing Lemma (see [2] where e(U, W ) denotes the number of edges of the form uw with u ∈ U and w ∈ W .
As mentioned earlier, Thomason introduced the notion of pseudorandomness which is a little more general, and in particular, we shall -in this paper -confine our attention to the notion of pseudorandomness in bipartite graphs as proposed by Thomason in [21]. Definition 1.1. Suppose 0 < p < 1, and 0 ≤ ε < 1. A bipartite graph G with vertex classes X and Y of sizes k and n respectively with k ≤ n is called Thomason pseudorandom with parameters (p, ε) if every vertex in X has degree at least pn, and every pair of distinct vertices in X have at most p 2 n(1 + ε) neighbors in common.
At this juncture, a few remarks are in order. Thomason's original definition in [21] actually only considers bipartite graphs with |X| = |Y | = n. Secondly, Thomason's definition in [21] is more in line with the original notion of pseudorandomness in [20]: A graph G(X, Y ) is pseudorandom with parameters (p, µ) for some µ ≥ 0 where the second condition states that every pair of vertices in X have at most p 2 n + µ common neighbors. The definition that we shall be using is a relaxation of the restriction that |X| = |Y |, but also a restriction to the more natural and intuitive case where µ ≤ εp 2 n.
Notions of pseudorandomness are usually "symmetric" or "global" in their definitions as in the definition in [20] or in the definition of an (n, d, λ) graph. This latter notion is at first glance somewhat asymmetric in the sense that the conditions imposed on the degrees and codegrees are only for the vertices of X. However, it is a simple exercise (which we shall not get into here) to show that these conditions also imply certain restrictions on the degrees and codegrees of the vertices of Y as a consequence of the following analogue of the expander-mixing lemma (restricted to our setup): Theorem 1.3 (Theorem 2 in [21]). Let G(X, Y ) be a bipartite graph with |X| = k ≤ n = |Y |, which is Thomason pseudorandom with parameters (p, ε). Then for every subset A ⊆ X of size at least 1/p and every subset B ⊆ Y , with |A| = a and |B| = b, Again, we remark that Thomason's theorem in [21] is stated for pseudorandom bipartite graphs G(X, Y ) with |X| = |Y | = n and parameters (p, µ). But a glance at the proof there immediately tells us that the same proof works in our general setup as well. The interesting point is that this asymmetric definition of pseudorandomness also yields the aforementioned theorem. A heuristic and somewhat simplistic explanation for this is that we are restricting ourselves to bipartite graphs, and it is precisely due to the bipartite structure of the graph that the arguments go through.
Another reason why we prefer to work with this notion of pseudorandomness is that it is combinatorial in its definition; it only considers the degrees of the vertices and codegrees of pairs of vertices of X, which is computationally easy to verify. In addition, it is a reasonably robust notion which also allows us to generate several non-trivial examples of Thomason pseudorandom graphs. While it is true that many notions of pseudorandomness do pass onto subgraphs, we did not find any concrete statement in the literature that established the same here for this notion. So we took it on ourselves to prove its robustness; see the lemma in the Appendix for a precise statement.
Pseudorandom graphs enjoy several very interesting properties. It is not hard to show that (n, d, λ) graphs with d − λ ≥ 2 are d-edge connected and as a simple consequence, it follows that for even n, (n, d, λ) graphs have a perfect matching [18]. In the more general context, it is conceivable that Thomason pseudorandom graphs admit "almost-perfect" matchings, i.e., admit a perfect matching on at least (1 − o(1))|V | vertices under not-too-restrictive conditions. The second result of our paper proves a more general version of this statement for NMP for Thomason pseudorandom graphs.
Before we formally state our result, we need the following definition. Definition 1.2 (NMP-Approximability). Suppose ε > 0. For functions f, g : R + → R + such that f (x), g(x) → 0 as x → 0, a bipartite graph G(X, Y ) is said to be (f, g, ε)-NMP approximable if there are subsets Del X ⊆ X and Del Y ⊆ Y such that: • The bipartite subgraph induced on the sets X \ Del X and Y \ Del Y has NMP.
We now state our second main result of the paper.
Theorem 1.4. Suppose 0 < ε < 1, and let ω : N → R + be a non-negative valued function that satisfies ω(k) → ∞ as k → ∞. There exists an integer k 0 = k 0 (ε, ω) such that the following holds. Suppose p ≥ ω(k) k , |X| = k, |Y | = n with k 0 < k ≤ n, and suppose G = G(X, Y ) is a Thomason pseudorandom bipartite graph with parameters (p, ε). Then G is (f, g, ε)-NMP-approximable with Note that in the statement of Theorem 1.4 the bounds f = g = O(x 1/4 log(1/x)) work for all (k, n). The first part of the theorem is a stronger conclusion when n k. At the level of generality of the statement of Theorem 1.4, it may in fact be necessary to delete some vertices from the graph in order to achieve NMP. Indeed, the definition of a Thomason pseudorandom graph does not preclude the existence of isolated vertices; in fact, one could add a few isolated vertices to Y to get another pseudorandom graph with only slightly worse parameters! Also, on a less frivolous note, suppose n = O(k) and ω(k) log k, and consider G(k, n, p); a consequence of the proof of the second item of Theorem 1.2 (which appears later in the paper as Lemma 3.1) shows that there are isolated vertices in Y whp. Since G(k, n, p) is also Thomason pseudorandom whp it follows that over the sparser regime for p (where Theorem 1.4 is applicable), the deletion of some vertices is indeed necessary to arrive at the conclusion of Theorem 1.4. Theorem 1.4 essentially says that if we have a not-too-sparse pseudorandom bipartite graph, i.e., a Thomason pseudorandom graph with p not too small, then we can remove a small fraction of vertices from both parts such that the graph induced by the remaining vertices has the normalized matching property. The sense of how small these sets are is described using the notion of NMP-Approximability defined above. As we shall see, the proof actually establishes an "approximate decomposition" theorem: the vertex set of any Thomason pseudorandom bipartite graph almost admits a decomposition into copies of what we call a Euclidean Tree -a small tree that arises canonically via the execution of the Euclidean algorithm. Furthermore, the entire process of obtaining Del X and Del Y is algorithmic (and efficient) in nature and we consider this to be a major feature of our argument. After the publishing of this article, we have learned that this notion of Euclidean Trees has been defined prior to our work in the context of graphic matroids 1 (see [26]). So we find it quite interesting to see it reappear in the context of a seemingly unrelated problem.
The rest of the paper is organised as follows. The next section gives some preliminaries and sets up terminology and tools that will be of use in the latter sections. In Section 3 we prove Theorem 1.2, and in Section 4, we prove Theorem 1.4. The paper concludes with some remarks and open questions in Section 5, and an Appendix. As mentioned earlier, the lemma in the Appendix can serve as a generator of several examples of Thomason-pseudorandom graphs for which Theorem 1.4 is applicable. The main reason for including the Lemma is that most of the standard and well-studied examples of pseudorandom graphs that arise from algebraic structures/posets tend to have |X| = |Y |, or even in the cases where |X| = |Y |, the corresponding bipartite graphs are much sparser than the ones we need in our hypothesis.

Preliminaries
For a vertex x, d(x) shall denote its degree, and for sets A ⊆ X, B ⊆ Y , e(A, B) shall denote the number of edges between A and B.
We shall repeatedly make use of the Chernoff bound: [15]) Suppose X ∼ Bin(n, p) is a binomial random variable 1 We thank Attila Sali for bringing this to our attention. and λ := E(X) = np. Then for t > 0 .
A natural question that arises in the context of NMP is: If G(X, Y ) has NMP, then does G(Y, X) also have NMP, i.e., is it true that for all T ⊆ Y, |N (T )| |X| ≥ |T | |Y | ? This is not immediately obvious from the definition of NMP, but it is indeed the case, as can be immediately seen from the second characterization of Theorem 1.1 which is symmetric in X and Y .
We begin with a simple proposition that will be instrumental in our proof of Theorem 1.2 in Section 3. For a graph G(X, Y ) that does not have NMP we say that a set of vertices S ⊆ X witnesses the violation of NMP for where we subtracted both sides from 1 and used the simple fact that N (X \ N (T )) ⊆ Y \ T in the final inequality. Now, to see the "moreover" part, as G does not have NMP, first let S be a minimal set that witnesses the violation of NMP for G(X, Y ). By the minimality of S, we have |N (S)| ≥ n k (|S| − 1). If |S| ≤ k 2 , then we are through, so suppose that |S| > k 2 . Let T = Y \ N (S). Then note that |T | < n 2 + n k . But then by the argument above (which is symmetric in X and Y ), T witnesses the violation of NMP for G(Y, X).
We also take note of a couple of facts from literature on random graphs that will be useful in the proof of Theorem 1.2. By d(x) (respectively d(y)) we mean the degree of vertex x into Y (respectively the degree of vertex y into X) in G(X, Y ) = G(k, n, p).
This follows from the following well known result (see [5] for instance, chapter 3) that in G(n, n, p) if p = log n+(r−1) log log n+ω(n) n for any function ω(n) that goes to infinity with n, then whp G(n, n, p) has minimum degree r since the number of vertices of degree r is approximately Poisson. The same argument extends to G(k, n, p) as well. This is an easy consequence of the Chernoff bound (Theorem 2.1).
We now introduce an important ingredient that is vital to the proof of Theorem 1.4. Suppose , L are positive integers with gcd( , L) = 1. A tree will be called a left-right tree if the two color classes of its vertex set are labelled as "left" and "right" respectively. Since a connected bipartite graph admits a unique 2-coloring of its vertices, a left-right tree can be thought of a tree with a label on each vertex denoting its color class.
The Euclidean ( , L)-tree which we shall denote by T ,L , is a left-right tree on + L vertices with left vertices, and L right vertices that is defined recursively as follows. If = 1, T 1,L is simply a star on L + 1 vertices with one left vertex and L right vertices. If L = 1, then T ,1 is the star on The following lemma conveys why Euclidean trees are relevant to us. Lemma 2.2. Suppose T = T ,L is a Euclidean tree. Then if X, Y denote the sets of left and right vertices respectively, then T as the bipartite graph T (X, Y ) has NMP. Moreover, so does the graph obtained by making several vertex-disjoint copies Proof. First assume that < L. If = 1, then T is simply a star with L leaves, and clearly, T has NMP. Suppose by induction that Euclidean trees with fewer than + L vertices have NMP.
where N (S) is the set of neighbors of S among {y 1 , . . . , y }. But since T ,L− has NMP, we have |N (S)| ≥ L− |S|, so that |N (S)| ≥ |S|+ L− |S| = L |S| and that completes the proof. If > L, then the above argument works with swapped with L throughout and the fact that T (X, Y ) has NMP if and only if T (Y, X) does. Finally, the observation that T (X , Y) has NMP follows immediately from the third (multiplicity function) characterization of NMP in Theorem 1.1.
We now describe what we call the "Euclidean ( , L)-tree process" which details a realization of the graphs T ,L through a series of steps, which along with the corresponding terminology we Figure 1: Construction of the Euclidean (3, 7)-tree. Each successive matching is shown in a different color.
build here will be relevant in Section 4 in the proof of Theorem 1.4. This description also justifies why we call them Euclidean trees. Suppose < L. Consider the Euclidean algorithm on the pair ( , L) as follows.
If we set r m+1 = L, r m = , r 0 = 0, then we may write the equalities above as r i+1 = q i r i + r i−1 for 1 ≤ i ≤ m. m is referred to as the complexity of the Euclidean algorithm for the parameters ( , L).
The following fact is well-known (see for instance, [17], page 360).
Fact 2.4. The complexity of the Euclidean algorithm with input parameters ( , L) is at most 2.078 log L + 0.6723.
We now describe T ,L as the evolution of an inductive sequence of trees through m stages (m as above), and in order to do that, we need some additional terminology. By an X q-fan, we mean the tree T 1,q and by a Y q-fan, we mean T q,1 . By an X q-thrill 2 of size r we mean a union of r vertex disjoint X q-fans, and a Y q-thrill is defined analogously. For a fixed graph F , an F -factor in a graph G is a spanning subgraph of G consisting of vertex disjoint copies of F . As an example, an X q-thrill admits a factoring by X q-fans.
By definition, T ,L is inductively obtained through a sequence of edge disjoint unions of matchings, until we finally terminate in a tree T q,1 or T 1,q , for some q. We now invert this process.
Suppose m as described above in the Euclidean algorithm is even (the odd case is analogous). Let T 1 := T r 2 ,r 1 = T r 2 ,1 . Having inductively defined T i−1 with left set X (i−1) , right set Y (i−1) and edge set E i−1 , we define T i as follows. If i is even, then the vertex set of T i has left set X (i) := {x 1 , . . . , x r i }, right set Y (i) = {y 1 , . . . , y r i+1 }, and the edges of T i consist of the edges of T i−1 along with an additional X q i -thrill of size r i between the vertices of X (i−1) and the vertices of Y (i) \ Y (i−1) . If i is odd, then T i has left vertex set X (i) := {x 1 , . . . , x r i+1 }, right vertex set Y (i) := {y 1 , . . . , y r i } and the edges of T i consist of the edges of T i−1 along with an additional Y q i -thrill of size r i between the vertices of X (i) \ X (i−1) and the vertices of Y (i−1) . In simpler terms, it is the same construction but with the roles of the left and right sets reversed as per the parity of i. The main point is that the graphs T i are precisely the Euclidean trees T r (i+1) ,r i (or T r i ,r (i+1) depending on the parity of i) along with isolated vertices. While the inductive definition of the Euclidean tree T ,L appends one additional matching at each step, the Euclidean tree process accelerates this by adding a q-thrill for an appropriate q. In particular, T m is precisely T ,L and as we shall see in Section 4, it is particularly handy to think of T ,L as the end result of this evolving process. Figure 2 gives an illustration of this evolution for the Euclidean tree We establish item 2 first i.e., that if p is below the threshold then whp, G does not have NMP. The proof is straightforward as it simply shows the existence of an isolated vertex in Y whp.
for large n. This concludes the proof.
Lemma 3.1 establishes that the right threshold for having NMP in G must be at least as large as log n k . The following is a heuristic argument that suggests that it is exactly log n k . As mentioned in the Introduction, a classical result of Erdős-Rényi states that a sharp threshold for the existence of a perfect matching in a bipartite graph G(n, n) is p = log n n . In our present situation, suppose k divides n. Replicate each vertex of X by a factor of n/k to obtain the set X . Define the graph G (X , Y ) as follows. If x ∈ X arises from the replication of the vertex x ∈ X, then x y ∈ E(G ) if and only if xy ∈ E(G). It is a straightforward exercise to see that the original graph G(X, Y ) has NMP if and only if G (X , Y ) satisfies Halls' condition, or equivalently, G has NMP if and only if G has a perfect matching. If this new bipartite graph behaves likes G(n, n, p) (which it isn't), then we need p ∼ log n n for the existence of a perfect matching. But since each vertex of X has been blown up to n/k copies, it is intuitive to expect that each vertex of G behaves like the union of all these n/k vertices bundled together, which suggests a threshold of n k · log n n = log n k . While this argument is just a heuristic, it suggests what the correct threshold ought to be, as we next show is indeed the case by establishing the remaining (and main) item 1 of Theorem 1.2.
Here is an overview of the proof. Lemma 3.2 proves the theorem when n/k is large (i.e., grows to infinity with k), and this part of the proof only takes recourse to Theorem 1.1. The general case however is a little more delicate. The basic idea in the general case of the proof considers estimating the probability that there is a minimal set S that violates the NMP condition. In that sense, our strategy follows a line of argumentá la Erdős-Rényi but we need some additional ideas and more careful analysis to carry it through to fruition. Lemma 3.2. Suppose n = kω(k) where the function ω(k) ≥ 1 for all k ∈ N and satisfies ω(k) → ∞ as k → ∞. Let 0 < ε, δ < 1. Then there exists k 0 = k 0 (ε, δ) such that for k ≥ k 0 (ε, δ), if p ≥ (1+ε) log n k , then P[G(k, n, p) has NMP] ≥ 1 − δ.
Suppose G fails to have NMP. By Theorem 1.1, there exists an independent set I = I X ∪ I Y in G such that |I X | k + |I Y | n > 1. Thus, from the union bound, the probability that G does not have NMP is at most k =1 P where for 1 ≤ ≤ k, where Here, P is an upper bound on the probability that there is a set S ⊆ X of size and a set T ⊆ Y of size n 1 − k such that S ∪ T is an independent set. P k is an upper bound on the probability that Y contains an isolated vertex.
We define ε := ε/2 and split P into three cases according to whether is "small", "intermediate", or "large" and repeatedly make use of the well-known bounds 1 + x ≤ exp(x) for all x ∈ R and the binomial coefficients N K ≤ eN K K for all K ≤ N .
Small Case: 1 ≤ ≤ ε k. Here, using n n(1− k ) = n n k followed by standard binomial coefficient bounds, (1) yields where to derive (4), we use the bounds n 1 − k ≥ n 1 − k , 1 + log k ≤ 1 + log k ≤ (1 + ε 8 ) log k and 1 + n k ≤ (1 + ε 8 ) n k for large enough k. This is where we crucially use our assumption that n/k → ∞ as k → ∞. (5) follows by using the trivial fact that log k ≤ log n and taking out the common factor n k · log n. (6) is obtained by using ≥ 1, plugging in ε = ε/2 and working out that the expression in the square brackets in (5) is at most −ε/8 for small ε. Finally, since n k > 16 ε for large enough k, it follows that P < 1/n 2 in this case.
Intermediate Case: ε k ≤ ≤ (1 − ε )k. Using the same expression for the upper bound on P as in the previous case, we have Using the observation that in this case, k (1 − k ) ≥ ε (1 − ε ) and the trivial bound 1 + n k ≤ 2n k , we obtain where the last inequality follows -setting x = /k -from the fact that x log 1 x < 0.5 for all 0 < x < 1. Hence, P < 1 n εn/3 .
This case is completely analogous to the small case. First, observe for large enough k (again using n/k → ∞ as k → ∞) and we have that P is at most where in the last step we use the bound 1 + log k k− ≤ 1 + log k ≤ 1 + ε 8 log k for large enough k. Consequently, To explain the last step, the expression within the square brackets evaluates to ε 512 (ε 2 + 280ε − 64) which is at most −199ε 12800 < −ε 128 when 0 < ε < 1/5. But n k > 256/ε for sufficiently large k and n since n/k → ∞. Thus, we have P = o(1) and that completes the proof of the lemma.
Note that the argument in the intermediate case does not require k = o(n) and in fact shows the following (in light of Theorem 1.1, switching from the independent set viewpoint to the violation of NMP viewpoint): Corollary 3.2. Given ε > 0, for any k ≤ n large enough, and vertex sets X and Y of sizes k and n respectively, the probability that there exists S ⊂ X with ε k ≤ |S| ≤ (1 − ε )k for ε = ε/2 such that S witnesses a violation of NMP for G(X, Y ) = G(k, n, p) is at most n −Ωε(n) .
Interestingly, the proof of Lemma 3.2 actually works out for all n ≥ k if one assumes p ≥ 10 log n k in the hypothesis instead of the sharper assumption on p. This, combined with Lemma 3.1, already establishes that log n k is a threshold for NMP. The additional ideas employed in the remainder of this section are essentially only required to show that log n k is a sharp threshold.
Proof of Theorem 1.2. In light of Lemma 3.2, it suffices to prove the theorem assuming n k ≤ log n. log n here may be replaced by any slow-growing (but unbounded) function of k or n without much change to the rest of the argument, but we stick to log n for convenience. By Lemma 2.1 either there exists S ⊂ X with |S| ≤ k/2 that witnesses a violation of NMP for G(X, Y ), or there exists T ⊂ Y with |T | < n 2 + n k that witnesses the violation of NMP for G(Y, X) (of course, these cases need not be mutually exclusive; we merely use that combined, they exhaust the event that NMP is violated). The proof naturally splits into cases (labelled X and Y respectively) according to whether the set winessing the violation is a subset of X or Y . We shall show that either case occurs with low probability by exploiting certain properties of the minimal witness.
Case X: Define min to be the constant 18 ε if 1 ≤ n k < 2 and ε log n 2 if 2 ≤ n k ≤ log n. In light of Facts 2.2 (for r = 36 ε if 1 ≤ n k < 2) and 2.3, it follows that any minimal S ⊂ X that witnesses the violation of NMP for G(X, Y ) must have size at least |S| ≥ kδ(G) n ≥ min whp where δ(G) denotes the minimum degree of the vertices in X. The choice of the peculiar constant r = 36 ε will become clear later.
Suppose S ⊂ X such that min ≤ |S| = ≤ ε k where ε = ε 2 . We first claim that every U ⊂ N (S) of size n k witnesses at least 2 neighbors (as a set) in S. Indeed, suppose there is a subset U of n k vertices in N (S) which are the neighbors of only one vertex x in S. Then by the minimality of S, it follows that the set S = S \ {x} satisfies n k |S| − n k > |N (S )| ≥ n k (|S| − 1) which is a contradiction, and that proves the claim.
We divide case X further into two subcases. First, we bound the probability that there exists S ⊂ X of size for which 4 n log n k 2 < 1 (notice that this clearly implies ≤ ε k) which witnesses a violation of NMP for G(X, Y ). So fix a choice for S ⊂ X of size , and T ⊂ Y (which will represent N (S)) of size equal to some integer in the interval [ n k − n k , n k ). Fix a partition of T into sets of size n k . By size considerations, there are at least t = n( −1) such parts, and by the observation above, each such part admits at least two neighbors in S. We conclude that the probability that there exists S ⊂ X with |S| ≤ k 2 4n log n which witnesses a violation of NMP for G(X, Y ) is at most To see why, observe that there are k choices for S, at most n/k values for |N (S)| (since S minimally witnesses a violation of NMP), each of which is at most n k . The probability that e(S, Y \ N (S)) = 0 is at most (1 − p) n(1− k ) , and finally, the last expression is a bound on the probability that each of the t blocks of vertices has at least 2 neighbors in S. The condition on that we have imposed in this subcase simply translates to the observation that the quantity in the right-most parenthesis that is raised to t is less than 1. So, we have Using 2t ≥ − 3 and p ≤ 2 log n k ≤ n k 4e log 2 n n ε/3 ≤ k 3 32 log 3 n 4e log 2 n n ε/3 min for n, k sufficiently large and where in the final step, we used the fact that an infinite geometric series is at most twice the first term, when the common ratio is small enough. This expression is clearly o(1) when n k ≥ 2 (and so min = ε log n 2 ). Further, it is at most k 3 32 log 3 n 4e log 2 n n ε/6 For the subcase k 2 4n log n ≤ ≤ ε k, we simply bound (which we shall call Σ 2 ) the probability of a minimal S whose size is in this range by the probability that S ∪ N (S) is independent and sum over the entire range of again. First, observe that in this subcase, ek ≤ 4en log n k ≤ 4e log 2 n and thus, 4n log n 4e log 2 n n 1+ε/6 n/k · 4e log 2 n n ε/6 = o(1) as before and we are through.
Case Y : There is a minimal witness T ⊂ Y with |T | = s ≤ n 2 + n k that witnesses the violation of NMP for G(Y, X). This time though, since k ≤ n it follows that |N (T )| ≤ ks n , and that for every x ∈ N (T ) there are at least 2 neighbors in T . Now, define s min := 12 ε . As earlier, by Fact 2.2, the minimal T ⊂ Y that witnesses the violation of NMP for G(Y, X) must have size at least s min whp. Again, we split this into two subcases: s min ≤ s ≤ ε n and s ≥ ε n where again ε = ε/2. Suppose s min ≤ s ≤ ε n. Analogous to how we divided Case X into two subcases, let us first assume that s ≤ k 2 log n which in particular, lets us assume that sp < 1. Then the probability that such a witness exists of size in this range is at most As n k ≤ log n and k n ≤ 1 ≤ 144k 2 ε 2 log 2 n s≥s min (Geometric series bound) < 144k 2 ε 2 log 2 n 8e 2 log 2 n n ε/2 where to derive (9), we use ks n ≥ ks n − 1 in the exponent and the more crude bound ks n ≥ ks 2n elsewhere, which is applicable since by assumption, ks n ≥ |N (T )| ≥ s min > 1. We also subsequently drop the range k 2 log n ≥ s ≥ s min in the sum for convenience. Next, if k 2 log n ≤ s ≤ ε n, then we simply bound the probability of there being a witness of size in this range by the probability that T ∪ N (T ) is an independent set (i.e. the final parenthesis in the expression for M 1 above is dropped) and sum over this range of s again. The calculations (for the accordingly defined expression M 2 ) are very similar to that of Σ 2 in case X and are omitted here.
Finally, if |T | > ε n, then note that S = X \ N (T ) has size (1 − ε )k ≥ |S| ≥ ε k, and by Lemma 2.1, S witnesses the violation of NMP for G(X, Y ) and is covered by Corollary 3.2.

Normalized Matching Property in Pseudorandom Graphs
In this section, we prove Theorem 1.4 which is restated below for convenience. Suppose 0 < p < 1 and 0 < ε < 1. Recall that a bipartite graph G(X, Y ) with |X| = k ≤ n = |Y | is called Thomason pseudorandom with parameters (p, ε) if every vertex in X has degree at least pn, and if every pair of vertices in X have at most p 2 n(1 + ε) neighbors in common. Theorem 1.4. Suppose 0 < ε < 1, and let ω : N → R + be a non-negative valued function that satisfies ω(k) → ∞ as k → ∞. There exists an integer k 0 = k 0 (ε, ω) such that the following holds. Suppose p ≥ ω(k) k , |X| = k, |Y | = n with k 0 < k ≤ n, and suppose G = G(X, Y ) is a Thomason pseudorandom bipartite graph with parameters (p, ε). Then G is (f, g, ε)-NMP-approximable with In what follows, G = G(X, Y ) is a Thomason pseudorandom graph with parameters (p, ε) where ε > 0 and p ≥ ω(k) k where ω(k) denotes a function that satisfies ω(k) → ∞ as k → ∞. As always, |X| = k ≤ n = |Y |, and n, k are sufficiently large (depending on the choice of ε and ω). As in the proof of Theorem 1.2, we split the task of proving NMP-approximability into two cases: the first, in which n is significantly larger than k and the second, in which the two are comparable.
Here is a brief overview of the proof. Suppose that where the latter is the representation in reduced form i.e., gcd( , L) = 1 and , L ∈ N. Our strategy of proof is to show that we can find small sets admits a vertex decomposition into copies of the Euclidean tree T ,L . Since T ,L has NMP by Lemma 2.2, this establishes the NMP-approximability of G. An essential ingredient in the proof of both cases is Lemma 4.1 (which appears below) which basically states: If G(X, Y, E) satisfies that for every subset A ⊆ X of size at least 1/p and every subset B ⊆ Y , we have |e(A, B) − p|A||B|| ≤ pn|A||B|(1 + εp|A|), then all large enough subsets of X, Y admit an almost partition into Xthrills or Y -thrills (as the case may be).
The application of this lemma in the first case (n/k large) is straightforward, but in the second case, it does not apply directly. The principal issue in the second case emanates from the possibility that in the reduced form , L are still large; for instance if n, k are coprime, then ( , L) = (k, n) and Lemma 4.1 does not apply. To circumvent this difficulty, we pre-process the graph, by deleting a small portion from both X, Y to get X , Y so that the reduced form ( , L) for (|X |, |Y |) satisfies , L = O ε (1). Lemma 4.1 then applies in a multi-step process that we describe in Lemma 4.2.
Lemma 4.1. Let ε > 0 and q ∈ N be such that q = n k or q = O ε (1). Suppose G(X, Y, E) satisfies the conclusion of Theorem 1.3. Let U ⊆ X and V ⊆ Y and define d 0 = 2εn. Then there exist subsets A ⊆ U, B ⊆ V such that if |U | = u, |V | = v, |A| = a, and |B| = b, then Proof. First, assume that |V | = q|U |. Let F be a maximal X q-thrill in G(U, V ) and let F ∩U =Ũ , i.e., letŨ denote the set of all those vertices in U which belong to a q-fan in F. Similarly, let F ∩ V =Ṽ and set A := U \Ũ , B := V \Ṽ . Since F is an X q-thrill, q(u − a) = v − b which gives b = qa. Note that we may assume that a > 1/p as otherwise, the bounds on a and b hold trivially since 1/p < d 0 /q for either assumption on q.
Now, assume that u = qv. This case proceeds analogously to the previous one, with only minor changes at appropriate places. Let F now be a maximal Y q-thrill and letŨ = F ∩ U and F ∩ V =Ṽ . Define A and B as in the previous case. Then by the maximality of F, no vertex in B has more than q − 1 neighbors in A, implying e(A, B) < qb. Further, we have a = qb. By Theorem 1.3, assuming a > 1/p as earlier, we have qb > pab − pnab(1 + εpa).
Upon plugging in a = bq and working out as before, we obtain the quadratic inequality which is identical to (15) except with b in place of a and qε in place of ε. Thus, it follows that b < 2 p + εn + 2 εpq ≤ 2 p + εn + 2 εp = d, therefore a ≤ qd. This implies the claimed bounds in terms of d 0 as before.
A few remarks are in order.
1. Though we have slightly stronger bounds on a and b in the second case (when u = qv), we simply use the stated bounds for the sake of ease of calculations later.
2. When ε = 0 (for instance in the pseudorandom graphs that arise from the point-hyperplane incidences of projective geometries), the calculations above in fact yield a < 1 p + n pq when v = qu and something analogous when u = qv. In particular, the sizes of the deleted parts are considerably smaller in this case.
3. If U ⊂ X ⊂ X, V ⊂ Y ⊂ Y then the conclusions of Lemma 4.1 hold even for the graph G(X , Y ) with the same parameters (p, ε) since the lemma directly applies to the pair (U, V ) as a subset of (X, Y ). This is vitally of use in the way we apply the Lemma in the proof of Theorem 1.4 part (b).
Proof of Theorem 1.4 part (a). Suppose n = qk + r, where q = n k and r is an integer such that 0 ≤ r < k. Choose an arbitrary subset C Y ⊂ Y of size r and define Y 1 = Y \ C Y . Apply Lemma 4.1 to the sets U = X and V = Y 1 to obtain A ⊂ X and B ⊂ Y 1 such that G(X \ A, Y \ B) is spanned by an X q-thrill and therefore has NMP (by Lemma 2.2). Define Del and Proof of Lemma 4.2. Partition both X and Y arbitrarily into "blocks", each of size t = gcd(k, n). Let the blocks be denoted by X 1 , . . . , X and Y 1 , . . . , Y L respectively. We shall refer to the X i blocks as left blocks and the Y j blocks as right blocks. Let r i , q j be the remainders and quotients as defined in Section 2. We shall now replicate the Euclidean-( , L) process with the vertices being replaced by these blocks, which we shall carry out in m stages, beginning with stage 1.
In the rest of the proof of Lemma 4.2 we assume that m is even; the m odd case is completely analogous. We also define the sets X (i) and Y (i) analogous to the sets X (i) and Y (i) in the definition of the Euclidean tree (see Section 2) as follows. If i is even, and if i is odd, then We also assume that X (0) = Y (0) = ∅.
We induct on m. At stage i, we apply Lemma 4.1 to appropriately defined sets U i and V i to obtain sets A i ⊂ U i and B i ⊂ V i such that G(U i \ A i , V i \ B i ) is spanned by an X q i -thrill or a Y q i -thrill (depending on whether i is even or odd respectively). In fact, it will turn out that U i and V i are large subsets of X (i) and Y (i) \ Y (i−1) respectively, when i is even (and something analogous when i is odd). We denote the set of deleted vertices from X and Y at the end of stage i by D X i and D Y i respectively, and these are obtained by modifying A i and B i suitably, with the help of D X as was defined in Section 2. By controlling the sizes of D X i and D Y i (which we denote by d X i and d Y i respectively) the Lemma follows by plugging in i = m because r m = and r m+1 = L.
Let us get to the details now. For starters, we apply Lemma 4.1 to the "first" r 1 right blocks (recall that r 1 = 1) and the "first" r 2 left blocks. More precisely, we apply Lemma 4.1 to This establishes the following: Suppose now that for some We shall show that there exist subsets D X i ⊂ X and D Y i ⊂ Y such that G i admits a T i -factor, and furthermore, which would establish the induction step.
By assumption, G i−1 admits a T i−1 -factor i.e., G i−1 is spanned by vertex-disjoint copies of T i−1 . Define CORRUPT X i to be the set of all those vertices in X (i−1) \ D X i−1 which belong to one of the above copies of T i−1 that also contains at least one vertex from A i . Obviously, A i ⊆ CORRUPT X i . Similarly, we define CORRUPT Y i as the set of vertices in Y (i−1) \ D Y i−1 which belong to a copy of T i−1 that contains at least one vertex from A i . We refer to such copies of T i−1 in G i−1 (that contain at least one vertex from A i ) as corrupt copies. Define as the set of those vertices of G i−1 that get "corrupted" due to the introduction of further deletions during stage i (i.e. the set A i ). In other words, CORRUPT i is the set of vertices touched by the corrupt copies. See Figure 3 for an illustration of the induction step. Figure 3: An illustration of the induction step in the proof of Theorem 1.4. The picture on the left depicts the copies of T i−1 that span G i−1 and are colored blue. The picture on the right depicts what happens to each of these copies in the induction step: those which have a vertex in A i (the topmost box in X (i) ) "corrupt" all the vertices that they contain (colored pink) and those which do not have a vertex in A i "evolve" to T i via an X q i -thrill into Y (i) \ Y (i−1) , shown in green.

Define
Note that every corrupt copy of T i−1 in G i−1 has r i vertices in X and r i−1 vertices in Y . Therefore, we have the bounds Putting things together, we obtain the recurrences where in the final step, we use r i+1 − r i−1 = q i r i and the fact that 1 + r i−1 ≤ r i < r i+1 .
We now prove that G i admits a T i -factor. Recall from the preliminaries that if T i−1 = T r i ,r (i−1) is the Euclidean tree with left vertices x 1 , . . . , x r i and right vertices y 1 , . . . , y r (i−1) , then T i = T r i ,r (i+1) is constructed on left vertices x 1 , . . . , x r i and right vertices y 1 , . . . , y r (i+1) , by adding to T i−1 an X q i -thrill of size r i between x 1 , . . . , x r i and y r (i−1) +1 , . . . , y r (i+1) . By Lemma 4.1, is spanned by an X q i -thrill. This, along with the copies of T i−1 that span G i−1 , gives us the desired T i -factoring of G i .
The proof of the inductive step when i is odd i.e., (2) ⇒ (b) is completely analogous (X swapped with Y everywhere). The only small difference that arises is in the recurrences for d X i and d Y i because of the slightly different bounds for |A i | and |B i | given by Lemma 4.1 in this case. In particular, by following the same line of argument as in the proof of (1) ⇒ (a), we obtain, in this case But then, by using the trivial bound q i +r i−1 ≤ r i+1 , we obtain the desired estimates d Thus, we have shown that there exist subsets D X = D X m ⊂ X, D Y = D Y m ⊂ Y such that G(X \ D X , Y \ D Y ) admits a T ,L -factor and consequently has NMP. Furthermore we have |D X | ≤ md 0 and |D Y | ≤ Lmd 0 .
We are now in a position to prove Theorem 1.4 part (b).
Proof of Theorem 1.4 part (b). Suppose G is a Thomason pseudorandom bipartite graph with parameters (p, ε) and with vertex classes X and Y of sizes k and n respectively with n k ≤ 1 √ ε .
Set α := 4 √ ε 3 and η := 4 √ ε and consider the interval [n(1 − α), n]. Since its length is αn, there is an integer N ∈ I such that N is a multiple of αn . Also, since ηk ≥ αn, there is an integer K in the interval J = [k(1 − 2η), k(1 − η)] such that K is a multiple of αn . With K and N as defined above (note that K ≤ N ), simply pick a subset C X ⊂ X of size k − K and C Y ⊂ Y of n − N arbitrarily and define a new graph G = G(X \ C X , Y \ C Y ). Observe that if L/ is the representation in reduced from of N/K, then L ≤ 1 G(X \ Del X , Y \ Del Y ) has NMP, where Del X = C X ∪ D X and Del Y = C Y ∪ D Y . By Fact 2.4 and the trivial bounds K ≤ n and N ≤ n, we have and that completes the proof. if the graph were random, viz., p|A||B| can be. For (n, d, λ) graphs, the analogue of this theorem is the expander-mixing lemma which provides precisely such an estimate.

Concluding Remarks
We illustrate this by returning to problem 2 that was stated in the introduction. For ε > 0, and q a sufficiently large prime power, let H be a multiplicative subgroup of F * q of order at least q 1/2+ε . Consider the Sum-Cayley graph Γ q (H) whose vertex set is F q and vertices x, y are adjacent if and only if x + y ∈ H. A result of Alon and Bourgain (see [1]) states that that Γ q (H) is a (q, |H|, q 1/2 ) graph, i.e., it is a regular graph on q vertices, with degree |H|, and every non-trivial eigenvalue of Γ q (H) is at most q 1/2 . If G is the bipartite graph described in the introduction following the description of problem 2, then it is not difficult to show that for any A ⊂ X, B ⊂ Y we have |e(A, B) − |A||B||H| q | < q|A||B| by using the expander-mixing lemma. Then, via the argument in the proof of Lemma 4.1 we have: If X, Y ⊂ F q with |Y | = 10|X|, |X| ≥ q/100, and let H is a subgroup of F * q of size at least q 1/2+ε , then there exists A ⊂ X, B ⊂ Y with |A| ≤ O(q 1−ε ), and |B| = 10|A| such that G(X \ A, Y \ B) has NMP. Consequently, every element of Y \ B can be labeled by some element of X \ A such that each label appears 10 times, and further, for each y ∈ Y labeled x, the sum x + y ∈ H. This answers in the affirmative, the approximate version of problem 2. One could pose more general questions of the same kind, but without the additional constraint that |Y | is a multiple of |X|. For instance, suppose X, Y ⊂ F q and |Y | = 3 2 |X| (say), with |X| ≥ Ω(q), and let H be a subgroup of F * q of size at least q 1/2+ε . Then one can similarly show that there exist subsets Del X ⊂ X, Del Y ⊂ Y with |Del X | ≤ f (ε)|X|, |Del Y | ≤ g(ε)|Y | such that if X , Y are the remaining sets, then one may form a star-array A of dimension |X | × |Y | whose rows and columns are labeled by the elements of X , Y respectively with the property that if the (x, y) th element of A is a star, then x + y ∈ H. Furthermore, each row of A has precisely 3 stars, and each column has precisely 2 stars.
• For a bipartite graph G(X, Y ) with |X| = |Y | that admits a perfect matching, the Max-Min Greedy Matching problem that was introduced in [8] goes as follows. Given permutations σ, π of the vertices of X and Y respectively, the vertices of X are processed according to σ, and each x ∈ X is matched to its earliest available neighbor in Y according to π. If M G [σ, π] denote the size of the resulting greedy matching, determine ρ[G] := maxπ minσ |M G [σ,π]| |X| . This problem admits a natural generalization. Suppose G(X, Y ) is a bipartite graph, with |X| = k, |Y | = n, with k ≤ n, and suppose r = n/k . As before, let σ, π be permutations of the vertices of X and Y respectively. We process the vertices of X according to σ and for each x ∈ X, we choose its first r neighbors in Y that have not been already chosen by some previous vertex of X according to π. Let m . Our proof of Lemma 4.1 can easily be adapted to establish the following: Suppose ε > 0, and let ω be a function such that ω(k) → ∞ as k → ∞. Then there exists k 0 = k 0 (ε) such that whenever n ≥ k > k 0 and G(X, Y ) is a (p, ε)-Thomason pseudorandom bipartite graph with |X| = k, |Y | = n, and p ≥ ω(k) k , then ρ r [G] ≥ 1 − O(ε). • Our proof of Theorem 1.2 on closer examination reveals that G(k, n, p) does not have NMP whp for p = log n−ω(n) k for any arbitrary function ω that goes to infinity. However, to prove the existence of NMP with high probability, our proof cannot extend beyond p = log n+O( √ log n) n . While it is possible to improve (using our methods) our result to prove that G(k, n, p) has NMP whp for p = log n+f (n) k for some f = o(log n), the question of whether there is a sharp threshold for NMP of the form p = log n+ω(n) k remains open.
• As remarked in the Introduction, our proof of Theorem 1.4 shows that f (x) = g(x) = O(x 1/4 log(1/x)) works uniformly for all pairs (k, n). Is it possible to improve this to f (x) = g(x) = O(x) uniformly over all (k, n)?
One interesting consequence of the proof of the lemma is that if we seek η = poly(ε) then one has a randomized algorithm to choose a set T ⊂ Y and a related BAD(T ) ⊂ X with |T | = D, |BAD(T )| ≤ ηk such that deleting these sets from Y, X respectively results in another Thomason pseudorandom graph with only slightly worse parameters.
It is known (see [21]) that bipartite graphs arising from the point-hyperplane incidence structure of a projective geometry of dimension d over a finite field F q is Thomason pseudorandom with parameters p = n −1/2 (1 + o(1)) and ε = 0. More generally, one can take the point-block incidence structure arising from a symmetric block design as the "seed" Thomason pseudorandom graph which upon the application of the lemma above gives us several other examples of Thomason pseudorandom graphs with parameters that are relevant in Theorem 1.4.