On the number of matroids compared to the number of sparse paving matroids

. It has been conjectured that sparse paving matroids will eventually predominate in any asymptotic enumeration of matroids, i.e. that lim n →∞ s n /m n = 1, where m n denotes the number of matroids on n elements, and s n the number of sparse paving matroids. In this paper, we show that lim n →∞ log s n log m n = 1 . We prove this by arguing that each matroid on n elements has a faithful description consisting of a stable set of a Johnson graph together with a (by comparison) vanishing amount of other information, and using that stable sets in these Johnson graphs correspond 1-1 to sparse paving matroids on n elements.


Introduction
After matroids up to 8 elements were enumerated by Blackburn, Crapo, and Higgs [BCH73], it was noted that a substantial fraction of the matroids were paving matroids, that is, matroids M whose circuits are all of cardinality at least r(M ). Crapo and Rota speculated that perhaps 'paving matroids will predominate in any enumeration of matroids' [CR70,p. 3.17]. Mayhew, Newman, Welsh and Whittle make the more precise conjecture in [MNWW11] that the asymptotic fraction of matroids on n elements that are paving tends to 1 as n tends to infinity. Their conjecture is equivalent to the seemingly stronger statement that (1) lim n→∞ s n /m n = 1.
Here m n denotes the number of matroids on a fixed ground set of n elements, and s n is the number of sparse paving matroids (a matroid is sparse paving if both it and its dual are paving). Sparse paving matroids seem benign objects compared to matroids in general. For example, it is straightforward that a sparse paving matroid on sufficiently many elements is highly connected. The predominance of sparse paving matroids as in (1) would thus immediately imply the predominance of k-connected matroids, which is conjectured but remains an open problem. Similarly, it is relatively straightforward that asymptotically all sparse paving matroids have a fixed uniform matroid U a,b as a minor, but the analogous statement for general matroids is open. Further examples along these lines are easy to find, and hence the central interest of conjecture (1).
Combining the lower bound of Graham and Sloane [GS80, Theorem 1] (as pointed out in [MW13]) on s n and the upper bound of Bansal, Pendavingh and van der Pol [BPvdP14] on m n , we have (2) 1 n n ⌊n/2⌋ ≤ log s n ≤ log m n ≤ 2 + o(1) n n ⌊n/2⌋ as n → ∞.
Together these bounds do not suffice to prove (1), but merely imply log s n ≤ log m n ≤ 2(1 + o(1)) log s n as n → ∞, or equivalently, that s n ≤ m n ≤ s 2(1+o(1)) n . The sparse paving matroids showed a somewhat less benign side when we attempted to narrow the gap between the upper and the lower bound in (2). Being unable to improve either bound, we devised a way to directly compare the number of matroids to the number of sparse paving matroids. This enabled us to prove the main result of this paper, that log m n ≤ (1 + o(1)) log s n as n → ∞, or equivalently, m n = s 1+o(1) n . Our method is closely related to the one used in [BPvdP14] to prove the upper bound on m n , which itself is an adaptation of a method to bound s n .
We will briefly outline the method and describe how it differs from earlier work. Key to our method is an algorithm for producing a compressed description of any given matroid M on E of rank r. The compression algorithm considers the set of bases of M as a subset of all the r-subsets of E, which are the vertices the Johnson graph J(E, r). We obtain a compact description of the matroid by starting from the full set of vertices A of the Johnson graph G = J(E, r) and iteratively taking away neighborhoods of vertices from A while describing the set of bases among these neighbourhoods. As long as there are vertices of high degree in G[A] to pick, the rate at which we need to add information into our matroid description compares favourably to the decrease in the size of A. We argued that while A is large there will be such vertices of high degree, and by the time A contains no more than a certain α-fraction of the vertices, the total amount of information stored so far will still be relatively modest. In our previous paper, we completed the description of the matroid by adding α n r bits to describe the subset of bases among the remaining vertices of A, and this is what ultimately dominated the length of the matroid description we obtained. Hence, the cost of describing the bases among the final set A was the bottleneck for producing a tighter upper bound on m n .
In the present paper, we use the fact that G[A] does not have vertices of high degree in the final stage to our advantage. Equivalently, most neighbours of a vertex X ∈ A then lie outside A, so that for most of these neighbours it is known whether they are basis of the matroid or not from the matroid description stored so far. Exploiting this information and matroid structure, we find that the α n r bits we used before can be replaced by a certain stable set T ⊆ A to obtain a faithful description of the matroid. Since our matroid description now consists of a stable set in the Johnson graph together with some contained amount of further information, it becomes possible to compare the number of matroids directly to the number of stable sets in the Johnson graph. As stable sets in J(E, r) are in 1-1 correspondence to sparse paving matroids, this implies our main result.
A further result in this paper is that there exists β > 0 such that asymptotically all matroids on n elements have a rank between n/2 − β √ n and n/2 + β √ n. This is related to a second conjecture from [MNWW11], that asymptotically all matroids on n elements have a rank between (n − 1)/2 and (n + 1)/2. After giving preliminaries on graphs and matroids in Section 2, we present both results in Section 3. Finally, we discuss several remaining open problems related to our main results in Section 4.

Preliminaries
2.1. Graphs and stable sets. We only consider loopless, undirected graphs in this paper. If G is any graph, then we write ∆(G) for the maximum degree in G. Further, for any A ⊆ V (G), we write G[A] for the subgraph of G induced by the vertices in A. A set of vertices A ⊆ V (G) is stable if G[A] spans no edges. We write i(G) for the number of stable sets in G.
In [BPvdP14], Nikhil Bansal and the current authors proved the following result on the number of stable sets in regular graph.
Theorem 1. Let G be a d-regular graph on N vertices with smallest eigenvalue −λ, with d > 0. Then i(G) ≤ ⌈σN ⌉ s=0 N s 2 αN , where α = λ d+λ and σ = ln(d+1) d+λ . The quantity αN is known as the Hoffman bound, which is an upper bound on the cardinality of a stable set in G. The spectrum of the Johnson graph is known: if r ≤ n/2, the eigenvalues of J(n, r) are Hence, the smallest eigenvalue of J(n, r) is −r, and using Theorem 1 one may derive that (4) log i(J(n, r)) ≤ 2 n n ⌊n/2⌋ (1 + o(1)) as n → ∞.
A corresponding lower bound is obtained from a construction by Graham and Sloane [GS80, Theorem 1], who show that (5) log i(J(n, r)) ≥ 1 n n r .
If X ∈ E r , then we will use the graph-theoretic term neighbourhood to denote the set N (X) : Note that N (X) is precisely the neighbourhood of X, seen as a vertex in the Johnson graph J(E, r). The neighbourhood of X has the structure of a Cartesian graph product of K r and K n−r . In particular, we can distinguish 'rows' x ∈ X (6) and 'columns' that induce cliques in the neighbourhood of X, see Figure 1.

Matroids.
A matroid is a pair M = (E, B) such that E is a finite set and B is a non-empty set of subsets of E satisfying the base exchange axiom We assume familiarity with the basic definitions in matroid theory, of circuit, flat, rank, dual etc., for which we refer to Oxley's book [Oxl11]. While we mostly use the notation of [Oxl11], the following is nonstandard. We write M n,r for the set of all matroids of rank r with ground set [n], and M n for the set of matroids with ground set [n], and we put m n,r := #M n,r and m n := #M n .
A matroid M is said to be paving if each circuit of M has cardinality at least r(M ), and M is sparse paving if both M and its dual are paving. We define s n,r := #{M ∈ M n,r : M is sparse paving}, s n := #{M ∈ M n : M is sparse paving}.
The following lemma is essentially established by Piff and Welsh [PW71]. A proof can be found in [BPvdP14, Lemma 8].
Proposition 2. Let B ⊆ E r , then B is the collection bases of a sparse-paving matroid if and only if E r \ B is a stable set in J(E, r). Hence s n,r = i(J(n, r)), which is bounded by (4).
. . Figure 1. The neighbourhood of X in the Johnson graph. The rows and columns, indexed by x ∈ X resp. y ∈ E \ X, form cliques. 2.4. The local structure of matroids in the Johnson graph. The following matroid lemma is elementary, but it is has a central role in this paper.
The lemma implies that if r(X) = r − 1, then the set of bases of M in the neighborhood of X has a very simple structure that is completely determined by the circuit C and the cocircuit D: see Figure 2. It is this simplicity which enables us to make faithful descriptions of matroids which are nearly as concise as a description of a stable set in the Johnson graph. In the present paper, we use (9) in two ways. The first use was already implicit in our previous paper [BPvdP14], where we used local covers as a short certificate for (in-)dependence in the neighbourhood of a non-basis. We review the basic definition, and quote the lemma which we will again use in our present argument.
If Y is a set in a matroid M , and F ∈ F(M ) is a flat such that |F ∩ Y | > r M (F ), then Y is necessarily dependent, and (F, r M (F )) serves as a certificate for the dependence of Y . In this case, we say that (F, r M (F )) covers Y . Note that M can be reconstructed from its groundset, rank, and a collection of (flat,rank)-pairs such that each non-basis of M is covered by at least one flat in the collection. 1 A local cover at X ∈ E r is a subset Z X ⊆ {(F, r M (F )) : F ∈ F(M )} with the property that each Y ∈ N (X) ∪ {X} is either independent, of covered by some (F, r M (F )) ∈ Z X . If X is a non-basis, then the local cover can be surprisingly small, as the following lemma shows. The proof can be found in [BPvdP14], and also follows from Lemma 3.
Lemma 4 ([BPvdP14, Lemma 20]). Let M be a rank-r matroid on groundset E, and let X ∈ E r be dependent in M . Then there exists Z X ⊆ F(M ) with |Z X | ≤ 2, that covers each non-basis Y ∈ N (X) ∪ {X}.
1 NB: In [BPvdP14], just the flat F (rather than the pair (F, rM (F )) was used as a certificate for dependence.
However, as the rank of F is necessary to reconstruct the matroid, it is more natural to use the pair.

4
The second use of (9) is new to this paper. The very restricted structure of N (X) ∩ B in the neighborhood of a dependent set X will allow us to recover the partition (N (X) \ B, N (X) ∩ B) from partial information (K \ B, K ∩ B) for certain K ⊆ N (X), so that a faithful matroid encoding can be even more sparse if we rely on a decoder which can infer from (9).
2.5. Binomial coefficients. We will use the following standard bounds on binomial coefficients.
We will also need the following result on binomial coefficients.
The lemma now follows from the inequality 1 − x ≤ e −x .
3. The number of matroids 3.1. A procedure for encoding non-bases. The bound in Theorem 1 is based on a procedure to construct a concise description of stable sets in a regular graph. Bounding the number of such concise descriptions immediately gives an upper bound on the number of stable sets in a regular graph. The procedure was adapted from a procedure described by Alon, Balogh, Morris, and Samotij in [ABMS14], who cite Kleitman and Winston [KW82] as the original source.
In [BPvdP14], this idea was combined with local covers to construct a concise description of matroids. In particular, it was shown that any matroid can be described by a stable set (in the Johnson graph) of non-bases S, a collection of flats covering all non-bases in S ∪ N (S), and a (relatively short) list of all the non-bases that are not yet covered. By bounding the number of possibilities for each of these sets, one obtains an upper bound on the number of matroids.
The bound that was obtained in [BPvdP14] is dominated by the list of non-bases that are not yet covered. This seems to be wasteful: if we can take into account more information about this list, we may be able to obtain stronger bounds. In the current section, we extend the encoding procedure to capture more information about this list of non-bases.
In what follows, we will assume that r ≤ n/2, and we will write G = J(E, r) as a shorthand.
We will further fix a linear ordering ≤ G on the vertices of G. By the canonical ordering on A ⊆ V (G), we refer to the following procedure to order the set A linearly. Let v be the vertex with maximum degree in G[A]; if there are multiple such v, then we choose the one that is minimal with respect to ≤ G . Call v the first vertex in the canonical ordering, and apply iteratively to A \ {v}.

Procedure 1 The procedure for encoding matroids
Input: Matroid M = (E, B) of rank r on n elements, r ≤ n/2 Output: (S, Z, A, T ) Set G ← J(E, r) while |A| > α n,r N or ∆(G[A]) ≥ r do ⊲ α n,r N is the Hoffman bound Pick the first vertex X in the canonical ordering of A if X is dependent then 3.2. Analysis of the procedure. Throughout this section, we write for the number of vertices, resp. degree, of the Johnson graph J(n, r). We will also abbreviate (14) α n,r = d d + λ = 1 n − r + 1 for the Hoffman bound, and (15) σ n,r = ln(d + 1) d + λ = ln(r(n − r) + 1) r(n − r + 1) .
The quantities α n,r and σ n,r will play the same role as α and σ in the statement of Theorem 1.
Proof. Note that the sets S, Z and A only change during execution of the while loop.
In each traversal, |A| decreases, and the procedure does not stop before |A| ≤ α n,r N , thus proving the bound on |A|.
As A ony gets smaller in each traversal, execution of the while loop falls apart into two stages: during the first stage, |A| > α n,r N , while during the second stage, |A| ≤ α n,r N and ∆(G[A]) ≥ r.
The first stage was analysed in [BPvdP14, Lemma 16], where it was shown that during this stage at most σ n,r N vertices are added to S. At the start of the second stage, A contains at most α n,r N vertices. Throughout this stage, each element that is added to S has degree at least r in A, as they are the first vertex in the canonical ordering on A. So each time a vertex is added to S during the second stage, at least r + 1 vertices are removed from A. Hence, during the second stage, at most 1 r+1 α n,r N vertices are added to S. Combining the bounds on the number of elements added to S during both stages, we obtain the bound on |S|.
The set Z is only extended when a vertex is added to S. Each time this happens, at most two new flats are introduced to Z, by Lemma 4, so |Z| ≤ 2|S|. 6 The following claim is obvious.
Claim 7. Upon termination of Stage 2, all vertices in A have at most r − 1 neighbours in A.

Claim 8. S ∪ T is a stable set in J(n, r).
Proof. First, S is a stable set, as each time a vertex is added to S, its neighbours are deleted from A, and hence never will be considered again. By the same argument, no element in A has a neighbour in S. By construction, the set T is stable, and as T ⊆ A, it follows that S ∪ T is stable. The claim follows, as the order in which r-sets are considered in the main loop is deterministic, and depends only on the choice that is made in each traversal. These choices are the same in both instances. By construction of M ′ , we have X dependent (when encoding M ) if and only if X ∈ S, which is equivalent to saying that X is dependent (when encoding M ′ ). The final equivalence follows, since vertices in T come after vertices in S in the canonical ordering in each traversal.
Let K be the set of non-bases of a matroid. Note that throughout the procedure, is maintained as an invariant. If the encoding procedure indeed constructs a concise description of matroids, it should be possible to reconstruct K from the output of the procedure. Non-bases in S ∪ N (S) are easily recognised, as they are covered by Z. On the other hand, recognising non-bases in A is a bit more involved.
The following claim is the engine of the corresponding decoding procedure. It roughly states that if X ∈ A, then X being dependent is completely determined by (S, Z, T ).
Claim 10. Let M be a matroid without loops and coloops, and let (S, Z, A, T ) be the output of the procedure on input M . Let X ∈ A, then X is dependent if and only if (i) there exists x ∈ X such that R X (x) is disjoint from A and fully dependent; or (ii) there exists y ∈ E \ X such that C X (y) is disjoint from A and fully dependent; or (iii) X ∈ T ; or (iv) X has a neighbour Y = X − x + y in T , and there are e ∈ Y , f ∈ E \ Y with the property that both R Y (e) and C Y (f ) are disjoint from A, both contain a basis, and at least one of Proof. To prove sufficiency, suppose that X ∈ A. If (i) holds, then X must be dependent.
For if X would be independent, and each Y ∈ R X (x) is dependent, then x is a coloop, which contradicts our assumption on M . Similarly, if (ii) holds, then X must be dependent. For if it would be independent, and each Y ∈ C X (y) would be dependent, then y is a loop, again contradicting our assumption on M . If (iii) holds, then X must be dependent, as by construction T contains only non-bases. Finally, suppose that (iv) holds. Let Y = X − x + y be an element of T neighbouring X satisfying the properties mentioned in (iv). As X is dependent, and has an independent neighbour, it must have rank r − 1. It follows that there is a unique circuit C contained in Y , and a unique cocircuit D disjoint from Y . As C Y (f ) contain an independent set, we have and since R Y (e) contains an independent set, we have Note that X = Y − y + x is independent if and only if C is not contained in X (so y ∈ C, or equivalently Y − y + f is independent), and X is not disjoint from D (so x ∈ D, or equivalently Y − e + x is independent). Taking the contrapositive, we find that X is dependent if and only if at least one of Y − y + f or Y − e + x is dependent.
It remains to prove necessity. Let us assume that X ∈ A is dependent. If neither (i) nor (ii) holds, then X ∈ A ′ . As T is a maximal stable set in G[A ′ \ B], we have A ′ \ B ⊆ T ∪ N (T ), so either X ∈ T , or X has a neighbour in T . In the former case, we have (iii), and we are done. So assume that X has a neighbour in T . Call this neighbour Y = X − x + y. As T ⊆ A ′ , there must exist e ∈ Y and f ∈ E \ Y , so that R Y (e) and C Y (f ) are disjoint from A, and both contain an independent set. Now we use again that X is dependent if and only if at least one of Y − y + f or Y − e + x are dependent.
The following lemma shows that the procedure can be used to construct an alternative description for a matroid, provided the matroid has no loops or coloops. Proof. First, it follows from Claim 9 that the pair (S ∪ T, Z) actually contains the more detailed information (S, Z, A, T ). The matroid M can be reconstructed from n, r, and (S, r it can be decided whether X is dependent or independent, bases on the available information alone. Let K = [n] r \ B be the set of non-bases in M . Recall that throughout the procedure, (17) is maintained as an invariant. By Claim 9, it can be verified from If it is not, then X must be independent. So we can suppose that X ∈ S ∪ N (S) ∪ A.
If X ∈ S ∪ N (S), then X is dependent if and only if it is covered by some flat in Z.
Hence, for each r-set that is not in A, we can reconstruct whether it is dependent from (S, Z) alone. It remains to identify the non-bases in A. By Claim 10, we only need to verify, for each X ∈ A, if at least one of (i)-(iv) holds. Verification of each of these items depends only on (S, Z, T ): (i) By Claim 9, the set A is completely determined by S, so for each neighbour of X, checking if it belongs to A can be done if only S is available. If Y is a neighbour of X that is not in A, then either it is not in S ∪ N (S), in which case Y must be independent -or it is in S ∪ N (S), in which case dependency of Y depends only on (S, Z). (ii) Similar. (iii) X ∈ T obviously depends only on T . (iv) For each neighbour Y of X, Y ∈ T depends only on T . By Claim 9, for each neighbour of Y , determining whether it is in A depends only on S. If the neighbour is not contained in A, then we can deduce from (S, Z) whether it is independent or not, by the previous part of this proof.
3.3. Bounding the number of matroids. Let us write m ′ n (resp. m ′ n,r ) for the number of matroids (resp. rank-r matroids) on groundset [n] that do not contain loops or coloops.
Proof. In view of Lemma 11, a matroid M ∈ M n,r without loops or coloops can be described by a pair (U, Z), in which U is a stable set of J(n, r), and Z ⊆ {(F, r M (F )) : F ∈ F(M )}. By Claim 6, we can assume that |Z| ≤ 2⌈σ n,r N ⌉. Note that m ′ n,r is at most the number of pairs (U, Z), and a bound on the number of such pairs follows immediately from the following two observations. First, as U is a stable set in J(n, r), it can be chosen in at most i(J(n, r)) = s n,r ways. Second, as Z is a subset of 2 [n] × {0, 1, . . . , n} of cardinality at most 2⌈σ n,r N ⌉, it can be chosen in at most 2⌈σn,r N ⌉ k=0 2 n (n+1) k ways. By an application of (11) followed by (10) we can bound the sum of binomial coefficients by 2 2 n (n + 1) 2 σ n,r n r ≤ 2 e2 n (n + 1) 3 4 n r 2⌈σn,r( n r )⌉ , which proves the second inequality.
The next lemma bounds the multiplicative factor in Lemma 12 uniformly in r.
The lemma follows as the expression in the logarithm is n O(1) by (12).
Proof. Upon combining the upper bound on m ′ n,r with the fact that s n,r ≤ s n , we find by Lemma 13 that m ′ n,r ≤ s n exp 2 c log 2 n n 2 n ⌊n/2⌋ , at least for r ≤ n/2. By duality, this same bound holds for m ′ n,n−r . As m ′ n = r m ′ n,r , this immediately yields log m ′ n ≤ log(s n ) + log(n + 1) + c log 2 n n 2 n ⌊n/2⌋ . This immediately implies the following corollary.
3.4. The rank of a typical matroid. It was shown in [LOSW13, Corollary 2.3] that for all ε > 0, the fraction of matroids having rank r in the range (1/2 − ε)n < r < (1/2 + ε)n tends to 1. In this section, we prove a slightly stronger statement.
Proof. By duality, it suffices to show that r(M ) ≤ n/2−β √ n for a vanishing fraction of matroids.
In view of (18), it suffices to restrict our attention to matroids without loops or coloops. In fact, we will show that for β sufficiently large The term m ′ n,r can be bounded by combining Lemma 12 with the upper bound on s n,r = i(J(n, r)) provided by Theorem 1. As m ′ n ≥ s n,⌊n/2⌋ ≥ 2 1 n ( n ⌊n/2⌋ ) , it follows that which, by Lemma 13 and the inequality α n,r ≤ 2 n is at most where the final equality follows from Lemma 5 with k = β √ n. For sufficiently large β, the right-hand side tends to 0, thus concluding the proof.
We like to remark at this point that Theorem 16 could be proved using the bound on m n,r that was derived in the proof of [BPvdP14, Theorem 3]. But since that paper does not have a separate lemma which we can refer to here, we make use of Lemma 12 of the present paper. We have tried to prove this conjecture by analysing the behaviour of (variants of) our compression algorithm on matroids without circuit-hyperplanes, so far without any success. We like to encourage the reader to give it another try, since it does feel as if we are very close. The key to proving a sufficient bound seems to be the behaviour of the algorithm when picking T ⊆ A. A more intelligent decoder may be able to reconstruct the matroid from a much sparser set T . Note that the asymptotic upper bounds on |S| and |Z| do not get worse essentially if we insist that the algorithm continues its main loop while ∆(G[A]) ≥ ǫr, for any fixed ǫ > 0.

4.2.
The rank of a typical matroid. By Theorem 16, almost all matroids on n elements have a rank between n/2 − β √ n and n/2 − β √ n. At the heart of the argument lies the fact that for sufficiently large k the ratio m n,⌊n/2⌋−k /m n,⌊n/2⌋ tends to 0, using that m n,⌊n/2⌋−k m n,⌊n/2⌋ ≤ s n,⌊n/2⌋−k s n,⌊n/2⌋ · 2 O log n n 2 ( n ⌊n/2⌋−k ) We see no better way to bound the factor s n,n/2−k /s n,n/2 than by combining the lower bound of Knuth and the upper bound from [BPvdP14], and we ask if a direct comparison would perhaps be possible. It would be a good start if we could argue that asymptotically all sparse paving matroids on n elements have a rank between (n − 1)/2 and (n + 1)/2. Presently we cannot even prove the unimodality of s n,r , i.e. that s n,r < s n,r ′ for all 0 < r < r ′ ≤ n/2.

4.3.
Comparing log s n,r and log m n,r for small r. By Theorem 16, most matroids have rank close to n/2. Consequently, in the derivation of our main result, Corollary 15, we are mainly interested in comparing m n,r to s n,r for r ≈ n/2. In this regime, the upper bound log m n,r ≤ log s n,r + log 2⌈σn,r N ⌉ k=0 2 n (n + 1) k from Lemma 12 suffices to show that (log s n,r )/(log m n,r ) → 1 as n → ∞.
On the other hand, when r (or by duality n − r) is small, this upper bound will not suffice, as illustrated by the following result.
vertex, except that exactly one component is a clique of the form {X ∈ [n] r : X ⊇ Y } for some Y ∈ E r−1 . If asymptotically all matroids are sparse paving, then the number of such matroids compared to s n,r must tend to zero as n tends to infinity (and say r ≈ n/2).