Regular graphs with many triangles are structured

We compute the leading asymptotics of the logarithm of the number of d-regular graphs having at least a fixed positive fraction c of the maximum possible number of triangles, and provide a strong structural description of almost all such graphs. When d is constant, we show that such graphs typically consist of many disjoint (d + 1)-cliques and an almost triangle-free part. When d is allowed to grow with n, we show that such graphs typically consist of very dense sets of size d + o(d) together with an almost triangle-free part. This confirms a conjecture of Collet and Eckmann from 2002 and considerably strengthens their observation that the triangles cannot be totally scattered in typical instances of regular graphs with many triangles. Mathematics Subject Classifications: 05C80, 05C30, 05C75

1 Introduction disjoint union of (d + 1)-cliques. In this paper we prove a strong, almost sure structural stability result for this extremal problem: Let 0 < c 1 and let G d,c (n) denote the set of d-regular graphs on n labeled nodes that contain at least c · T max triangles. We show that, for constant d and large n, almost all graphs in G d,c (n) consist of a disjoint union of (d + 1)-cliques and an almost triangle-free part.
This result may not seem surprising at first, and especially for c close to 1 it seems quite natural to expect. However we would like to emphasize that it also holds for very small positive c. In that regime the required number of triangles could easily be arranged in such a way that no vertex has more than one triangle in its neighborhood. However, as it turns out, such graphs only constitute a vanishing fraction of elements of G d,c (n).
The study of G d,c (n) has been initiated by Collet and Eckmann [5], who proved where G d (n) is the set of all d-regular graphs on n labeled nodes. They conjectured that this limit exists. Using a heuristic counting argument they concluded that the triangles should be clustered in a "typical" element of G d,c (n).
In this paper we prove their conjecture and find the correct limiting value Furthermore we prove that the observation that triangles are clustered in typical elements of G d,c (n) holds in a very strong sense: for almost all graphs in the class, almost all triangles are contained in (d + 1)-cliques.

Probabilistic context
While our methods are purely combinatorial, the results are more conveniently stated in the language of random graphs. What is the probability that a random graph has a lot more triangles than expected? This is a typical question in the field of large deviations, the theory that studies the tail behavior of random variables or, stated differently, the behavior of random objects conditioned on a parameter being far from its expectation. For example, one of the earliest results of this flavor, Cramér's Theorem states that for i.i.d. variables X ∼ X 1 , X 2 , . . . there exists a "rate function" I(x) depending on the distribution of X such that P N 1 X i N x ≈ e −N ·I(x) .

Upper tail interpretation
Let G d (n) be a random d-regular graph. Our results can be related to the logarithmic scaling of the upper tail probability P (T (G d (n)) > cT max ), where T (G) denotes the number of triangles in the graph G, by noting that In random graphs, the question about the upper tail for triangles in G(n, p) has been long studied for a constant factor of deviation from the mean [10]. More precisely, let t(G(n, p)) denote the triangle density in the Erdős-Rényi random graph, normalized so that E [t(G(n, p))] = p 3 . One would like to understand the asymptotic behavior of The dense case (p a constant) has been reduced to an analytic variational problem by Chatterjee and Varadhan [4] using methods from graph limits. However, the solution of this variational problem is only known in certain parameter ranges (see [17] for details). In the sparse (case p = o(1)) regime the asymptotics r(n, p, δ) ≈ n 2 p 2 log(1/p) have been determined in a long series of papers by many authors [2,7,9,11,15,22]. The variational methods were extended to (part of) the sparse regime in [3] and using this, Lubetzky and Zhao [18] found the exact asymptotics of r(n, p, δ) in the n −1/42 log n p 1 range. Recently, Cook and Dembo [6] and Augeri [1] extended it to the range n −1/2 p 1, and Harel, Mousset and Samotij [8] to all n −1 log n p 1. The case of the random regular graph, G d (n), has been studied much less. Kim, Sudakov, and Vu [14] obtained that the distribution of small subgraphs of G d (n) is asymptotically Poisson in the sparse case, implying an asymptotic formula for the tail probability . In particular, for fixed d the expected number of triangles in G d (n) is finite. This means that the "standard" regime for large deviationsexceeding the expected value by a constant factor -in this case is not interesting.

Maximum entropy random graphs with triangles
Our motivation for analyzing the logarithmic size of G d,c (n) originates from a related problem of describing "typical" graphs with a specific set of constraints. In general, such constraints can be either local or global, eg. restricting the degree of each node or prescribing the total number of triangles. In addition, they can be "hard" or "soft", eg. each node must have degree d or the expected degree of each node should be d.
To study typical graphs, one looks for the probability distribution P on G(n) satisfying these constraints that is as random as possible in the sense that it maximizes the Shannon entropy Results of Krioukov [16] suggest that certain constraints on the number of triangles might result in maximum entropy solutions with some geometric component. Our results further point in this direction: we show that typical d-regular graphs with many triangles contain a large, highly structured, part.
We only consider hard constraints that are a mix between local (d-regular) and global (at least T max triangles). Of course, the distributions satisfying such constraints are simply all probability distributions on the set G d,c (n). In this case it is well-known that the uniform distribution P d,c on G d,c (n) maximizes the Shannon entropy, with value While in our setting the maximum entropy solution is clear, estimating its entropythe logarithmic size of the set of G d,c (n) -turns out to be an important first step describing the structure of typical graphs with the given constraints on degrees and triangles.

Results
Motivated by the question "can local constraints induce global (geometric) behavior?", we study the random d-regular graph G d (n) conditioned on having at least a positive fraction of the maximum possible number of triangles. (For d fixed this just means linearly many triangles, in n.) With respect to the previous section, our setting is related to the entropy maximization problem with local and global constraints, i.e. where each node must have degree exactly d and must be incident to at least t triangles on average.
Recall that T max = T max (n, d) = d 2 n/3 denotes the maximum number of triangles an n vertex d-regular graph can have. Let G d,c (n) denote the set of d-regular graphs on n labeled nodes that contain at least c · T max triangles. We compute the leading asymptotics of |G d,c (n)| for fixed c, as n → ∞, where d is either a constant or can grow with n as long as log d = o(log n). We provide a structural description of a "typical" element of G d,c (n). We then extend these results to the case of k-cliques in d-regular graphs.

Number of d-regular graph with many triangles
The dependence of d on n will be suppressed from the notation. We always assume d = o( √ n). We will emphasize when constant d is assumed.
Theorem 1. For any 0 < c < 1 there is a C > 0 such that holds for any n and any d < √ n/C.
The part dn 2 log n d is related to log|G d (n)|, where G d (n) denotes the set of d-regular graphs on n nodes. In particular, results of [19] imply that for The C/ log(n/d)) terms are o(1) as long as d = o(n). The c log d/ log(n/d) term on the right hand side is only o(1) if log d = o(log n). Unfortunately, for d polynomial in n we do not get a sharp logarithmic rate.

Structure of d-regular graph with many triangles
For fixed d, it turns out, perhaps not so surprisingly, that in most elements of G d,c (n), most of the triangles cluster into disjoint (d + 1)-cliques. To make this statement precise, let us call a node bad if it is not part of a (d + 1)-clique but it is incident to at least one triangle.
Theorem 2. Let d be fixed and 0 < c < 1. With high probability a uniformly randomly chosen element of G d,c (n) has less than log log n log n n bad nodes. Thus, the number of triangles that are not part of a (d + 1)-clique is sublinear.
Note that Theorem 2 implies a two-phase structure: the graph consist of many disjoint cliques and an almost triangle-free part.
In Section 2.2 we prove a slightly more general result where we consider the case where a uniformly randomly chosen element of G d,c (n) has less than ε n n bad nodes, with ε n → 0, such that ε n log n → ∞.
These results show similarities with studies on the structure of dense maximal entropy random graphs with given edge and triangle densities. In a collection of works by Kenyon, Radin, Ren and Sadun [12,13,20,21] it was shown that the limits of these graph have a bipodal structure, at least in a narrow range just above the average. This means that the graph is split into two components with specific inter-and intra-component connection probabilities. In our setting we obtain a multipodal structure with a linear set of components, consisting of the (d + 1)-cliques and the triangle-free part.
We prove a similar result for the 1 d n case. Here, however, we cannot expect (d+1)-cliques to appear, as it is possible to construct families of examples with the correct leading logarithmic growth rate, that do not have any cliques. Instead, we introduce a notion of a pseudo-clique, which turns out to be a very dense subgraph of size d+o(d) with the property that different pseudo-cliques must be disjoint. (See the explanation at the beginning of Section 2.2.2 for details.) It turns out that a typical element of the ensemble consists of a collection of these pseudo-cliques together with an almost triangle-free part.
Theorem 3. Let 1 d √ n and fix 0 < c < 1. With high probability, almost all triangles of a uniformly randomly chosen element of G d,c (n) are contained in pseudocliques.

d-regular graph with many k-cliques
As a corollary to our methods, we also obtain similar results for regular graphs with many k-cliques. Let G d,c,k (n) denote the set of d-regular graphs on n nodes that contain at least c · T k,max = c d k−1 n/k subgraphs isomorphic to K k . As a natural extension of terminology, we call nodes bad if they are not part of a (d + 1)-clique but are incident to a k-clique.
Furthermore, for d fixed, almost all elements of G d,t k ,k (n) will have at most εn bad nodes.

The number of regular graphs with a given number of triangles
The proof of Theorem 1 consist of establishing a lower and upper bound on log |G d,c (n)|. More precisely, we will show that The theorem then follows after dividing by dn 2 log n d+1 . Proof of Theorem 1 (Lower bound). To establish a lower bound we construct a family of taking b disjoint (d+1)-cliques and an arbitrary m = n−(d+1)b = (1−c)n node d-regular graph. Clearly, these graphs will have at least c · T max triangles. Thus For d = o( √ m), McKay and Wormald [19] show that the number of d-regular graphs on m nodes is asymptotically Proof. All three parts follow easily from Stirling's approximation.
Combining these estimates with (4) and (5) we get log|G d,c (n)| log n m + log In the last step we used that log d d+1 , log c, log(1 − c) are all O(1). We now need to prove a matching upper bound on |G d,c (n)|. We do this by uncovering the edges of such graphs in a suitably chosen order, and recording whether in each step a new triangle is created.
We use an approach inspired by the configuration model. Let us denote by G * d (n) (respectively, G * d,c (n)) the set of d-regular graphs (respectively, d-regular graphs with at least c · T max triangles) on n labeled nodes, where additionally the edges leaving each node are assigned labels 1 through d. This means that each edge gets two labels, one from each end.
Given G * ∈ G * d,c (n), we define the configuration ordering ≺ on the set of edges of G * as follows. Let e = {i 1 , j 1 } and f = {i 2 , j 2 } be two edges of G * with i 1 < j 1 and i 2 < j 2 . Let us declare e ≺ f if i 1 < i 2 , or if i 1 = i 2 and the label of e is smaller than the label of f at their common node. Let e 1 ≺ e 2 ≺ · · · ≺ e nd/2 denote the edges of G * in increasing configuration order. Let G * [k] denote the subgraph of G * consisting of e 1 , . . . , e k .
Finally we define the "triangle reveal profile" function For any x ∈ {0, 1} nd/2 let us denote x = nd/2 j=1 x(j). Then φ(G * ) denotes the total number of edges e k that upon adding to the graph G * [k − 1] have created at least one new triangle. The next lemma gives an upper bound on the number of graphs in G * d,c (n) that have a fixed triangle reveal profile.
the electronic journal of combinatorics 29(1) (2022), #P1.7 Proof. The idea is to reconstruct a G * ∈ φ −1 (x) by starting from the empty graph and adding edges 1-by-1, according to the configuration order. Just like in the configuration model, each node starts with d half-edges, labeled 1 through d. First we take the half-edge with label 1 at node 1, and join it to any other half-edge. We can do this in dn − 1 ways. Then, in each subsequent step, we take the smallest node that still has half-edges, pick the one with the smallest label, and match it to any another half-edge. If we did not have constraints on triangles, the total number of possible (multi-)graphs we could create this way would be (dn − 1)(dn − 3) · · · 3 · 1, which is an upper bound on |G * d (n)|. In our case, the vector x dictates whether the next edge added has to create a triangle with previously added edges. By the definition of the configuration order, the number of possible choices for the kth edge is dn − (2k − 1), as the starting half-edge is fixed and there are exactly dn − (2k − 1) available half-edges at this step. However, when x(k) = 1, the number of choices for the ending half-edge is limited. Suppose the starting half-edge is incident to node j. Then, in order for this edge to create a triangle, the ending half-edge most be incident to one of the current 2nd neighbors of j. There are never more than d 2 second neighbors, and thus never more than d 3 possible half-edges to choose from.
Thus we get the upper bound which proves the lemma.
The main idea for the upper bound is now to consider a specific set of triangle reveal profiles x ∈ {0, 1} nd/2 , in which at least a c d−1 d+1 fraction of edges have revealed triangles. Proof of Theorem 1 (Upper bound). Let us introduce the following short hand notation, as it will come up frequently. Define Then, by Lemma 6, and using d 2 n, we see that To finish the proof, we will show that G * d,c (n) dn 2 |φ −1 (L)|. For this, consider the symmetric group S n , which acts on G * d,c (n) by permuting the node labels. For σ ∈ S n and the electronic journal of combinatorics 29(1) (2022), #P1.7 G * ∈ G * d,c (n), let us denote by G * σ the graph obtained by applying σ to the node labels. Furthermore let S n G * = {G * σ : σ ∈ S n } ⊂ G * d,c (n) denote the orbit of G * under the action of S n . We finish the proof modulo the following result, which we establish at the end of this section.

Lemma 7. For any
In other words, randomly relabeling the nodes of G * yields, with not too small probability, a graph whose φ(G * σ ) T c − 1.
Summing this inequality over all orbits of the S n actions yields G * d,c (n) dn 2 |φ −1 (L)| as claimed above. Note that |G * d,c (n)| = |G d,c (n)| · (d!) n . Combining this with (7) we get We are thus left to prove Lemma 7. For this we first show that for a uniform random permutation σ, the expected value of φ(G * σ ) is at least c · dn(d−1) 2(d+1) . Then the lemma will follow from a standard Markov-inequality argument.
Lemma 8. Let σ be a uniformly random permutation. Then Proof. Let X e (σ) be the indicator variable that the edge e of G * creates a triangle when it is added in the lexicographic order of G * σ . Then φ(G * σ ) = e X e (σ) and so Let e = {i, j} and let e be incident to exactly t e triangles in G * . Let v 1 , v 2 , . . . , v te denote the third nodes of these triangles. Then X e (σ) is 1 if at least one of these triangles is formed at the moment when e is added, which is equivalent to at least one of these nodes preceding both i and j in the σ-order. That is, min(σ(v 1 ), σ(v 2 ), . . . , σ(v te )) < min(σ(i), σ(j)). Then X e (σ) = 0 if and only if either i or j has the smallest σ value among i, j, v 1 , v 2 , . . . , v te . Since the σ-order of these nodes is a uniformly random permutation on t e + 2 elements, we get P (X e (σ) = 0) = 2/(t e + 2) and hence P (X e (σ) = 1) = 1 − 2/(t e + 2).
Thus, since t e d − 1, we get where the last inequality follows from e t e being 3 times the total number of triangles in G * , which is in turn at least c · n 3 d 2 . This finishes the proof of the lemma. Proof of Lemma 7. By simple algebraic considerations This is obvious when G * has no automorphisms (that is, when S n G * is in bijection with S n ), but it also holds in the general case since the stabilizers of different elements of the orbit S n G * are conjugate and hence have the same cardinality. Consider a uniformly random permutation σ ∈ S n . By (9) it is enough to show that with probability at least 2 dn we have φ(G * σ ) ∈ L, which is equivalent to φ(G * σ ) T c − 1. Observe that φ(G * σ ) cannot be bigger than dn 2 . Hence, using Lemma 8 from which we conclude that

The structure of regular graphs with a given number of triangles
A simple extension of the methods of the proof of Theorem 1 yields a strong structural description of a typical graph with at least c · T max triangles:  We will treat the two cases separately, but the following lemma will be useful for both. As before, we let t e denote the number of triangles the edge e is incident to. We say the edge e is δ-bad if 1 t e d − 1 − δd.
Hence, by the same computation as in (10) we get By the previous considerations, for any G * ∈ G * ε,δ d,c (n) we get that Summing the inequality |S n G * ∩ φ −1 (L ε,δ )| 2 dn |S n | over the orbits of the S n action in G * ε,δ d,c (n) we obtain the estimate which, combined with Lemma 6, yields Taking log of both sides finishes the proof.

Fixed d
Let us say that a node in G is bad if it is not in a (d + 1)-clique, but it is in a triangle.
The following statement is a (very) slight strengthening of Theorem 2.
Theorem 10. Let ε > 0 and d fixed. Among all d-regular graphs with at least c · T max triangles, the proportion of those where more than εn nodes are bad goes to 0 as n → ∞. This remains true even if ε → 0, as long as ε log n → ∞.
We will make use of the following simple observation, whose proof we omit.
Lemma 11. Let G be a d-regular graph. If all edges incident to a node v are incident to exactly d − 1 triangles, then v is part of a (d + 1)-clique.
Proof of Theorem 10. Let us set δ = 1/d and call 1/d-bad edges simply "bad". Suppose now that more than εn nodes of G are bad. Each bad node, by definition, is adjacent to at least two bad edges, so there are at least εn bad edges. Thus G ∈ G as long as ε log n → ∞, proving that with high probability a graph conditioned on having at least c · T max triangles has o(n) bad nodes, hence consists almost completely of (d + 1)cliques and a triangle-free part.

Growing d
An immediate generalization of Theorem 10 cannot hold for the d 1 case, because one can exhibit a family of d-regular graphs with c · T max triangles that contain no cliques at all, yet have the optimal, (1 − c)(d/2)n log n d , logarithmic growth rate. Such a family can be built, for example, by taking the disjoint union of many copies of H, together with a random d-regular graph, where H is K d+2 minus a perfect matching. Realizing the required c · T max triangles takes up only slightly more space this way than using copies of K d+1 , and the resulting decrease in the size of the random part is small enough that it does not affect the logarithmic growth rate. One can push this even further, and use disjoint d + o(d) size components (these still contain roughly d 3 triangles each), and a large random d-regular part of the appropriate size.
We will show in this section, that a typical graph in the ensemble does, in fact, resemble an element of this last family. The main reason the previous argument fails for d 1 is because now we cannot choose δ to be too small in Lemma 9, otherwise the gain will be less in magnitude than the error term O(dn log d). Nevertheless, if log d/ log n is small, then the gap between the main term and the error term allows us to choose both ε and δ to be small, which will be enough to learn something about the typical graphs in the ensemble. In particular, we can choose Then Lemma 9 implies that in a typical d-regular graph with at least c · T max triangles, most edges are incident to 0 or almost d triangles. As it turns out, this implies a structural description similar to that of Theorem 10. Let us first informally explain the result. We call a subgraph H ⊂ G a dense spot if |H| d + 1 and deg H (x) = d(1 − O(δ)) for all x ∈ H. Dense spots satisfy the following simple, combinatorial observations: • Two dense spots are either disjoint, or they intersect in d(1 − O(δ)) nodes.
• The union of a maximal, pairwise intersecting, family of dense spots has size d(1 + O(δ)). We call these pseudo-cliques.
• It follows that any two pseudo-cliques must be disjoint.
The following is a restatement of Theorem 3.
Proof. We set ε and δ according to (11). Then, a careful calculation using Lemma 9 shows that we have lim n→∞ |G ε,δ d,c (n)| |G d,c (n)| = 0, so it is enough to consider a graph G ∈ G d,c (n) \ G ε,δ d,c (n). The graph G then has, by definition, less than ε(d/2)n edges that are δ-bad. Let us call a δ-bad edge bad for brevity, and other edges good. Let us start by removing all edges with t e = 0 from G, and denote the remaining graph by G . Removing such edges does not change the t e value of the remaining edges. Let us call a node v ∈ G bad, if it is incident to at least δd bad edges. Then, since ε = δ 2 , it follows that G cannot have more than δn bad nodes.
The total number of triangles that are incident to either a bad edge or a bad node is at most ε(d/2)n · d + δn d 2 = O((ε + δ) · T max ). We will show that the rest of the triangles are concentrated in pseudo-cliques. To finish the proof of Theorem 12, we need to show that any triangle that is only incident to good edges and good nodes is contained in a pseudo-clique. We will show slightly more: that a good edge connecting good nodes is in a pseudo-clique.
Consider a good edge uv in G , where both u and v are good nodes. Since we already removed the edges with no triangles, t uv d − δd. In particular u and v share at least d − δd common neighbors. Each of u and v may be incident to at most δd bad edges. That means that the subset H 0 of common neighbors of u and v that are connected to both of them via good edges has size |H 0 | d − 3δd. Let H = H 0 ∪ {u, v}. We claim H is a dense spot. Clearly |H| 1 + deg(u) = d + 1, and by construction, deg H (u), deg H (v) (1 − 3δ)d (1 − 4δ)d. What remains to show is that for any node x ∈ H 0 we have deg H (x) (1 − 4δ)d. But xu is a good edge, hence x and u have at least (1 − δ)d common neighbors, or equivalently, at most δd of u's neighbors are not connected to x. Thus x is connected to at least (1 − 4δ)d nodes in H, proving that indeed H is a dense spot. So the uv edge is contained in a dense spot, and thus in a pseudo-clique.

k-cliques
We can easily extend the above results from triangles to k-cliques. Let G d,c,k (n) denote the set of d-regular graphs on n nodes that contain at least c · d k−1 n k subgraphs isomorphic to K k . (The maximum possible number of subgraphs isomorphic to K k is clearly d k−1 n k .) Proof of Theorem 4. The idea is a simple reduction the the k = 3 case. Clearly, each G ∈ G d,c,k (n) has at least = c · T max triangles, so G d,t k ,k (n) ⊂ G d,c (n), which implies the upper bound of the theorem. On the other hand, the family of graphs constructed in Theorem 1 contain n k k-cliques, so this family is contained in G d,c,k (n), implying the lower bound of the theorem. Finally, the structural statement follows directly from Theorem 10.