Deducing a variational principle with minimal a priori assumptions

We study the well-known variational and large deviation principle for graph homomorphisms from Zm to Z. We provide a robust method to deduce those principles under minimal a priori assumptions. The only ingredient specific to the model is a discrete Kirszbraun theorem i.e. an extension theorem for graph homomorphisms. All other ingredients are of a general nature not specific to the model. They include elementary combinatorics, the compactness of Lipschitz functions, and a simplicial Rademacher theorem. Compared to the literature, our proof does not need any other preliminary results like e.g. concentration or strict convexity of the local surface tension. Therefore, the method is very robust and extends to more complex and subtle models, as e.g. the homogenization of limit shapes or graph-homomorphisms to a regular tree. Mathematics Subject Classifications: 82B41, 82B20, 60F10

(d) Graph homomorphisms into the 3-regular tree (see [MT20]).   Figure 1e. The other models are included as illustrations of the universality of the limit shape phenomenon. Each subfigure shows a typical microscopic state for the specified model. Figure 1 shows typical configurations for some of these models, and some aspects of their limit shapes are already visible, such as the arctic circle (cf. [JPS98]) in Figure 1a and its analogues in the other figures. Figure 1e is a configuration of the model studied in this article, which we briefly introduce here (see Section 2 for a formal presentation of the model). We take as given a sequence of finite subsets R n ⊂ Z m , such that the rescaled sequence ( 1 n R n ) n∈N converges to a suitable domain R ⊂ R m . The basic objects in the model are graph homomorphisms h Rn : R n → Z, which we call Z-homomorphisms or height functions, and the limit objects are Lipschitz continuous maps h R : R → R, called asymptotic height functions.
The main results of this article partially describe the asymptotic structure of the set of height functions on R n subject to certain boundary constraints. The focus is to measure how may height functions are close to the various admissible asymptotic height functions (after rescaling), i.e. the size of the ball B(R n , h R , ε) := h Rn : R n → Z sup z∈Rn 1 n h Rn (z) − h R 1 n < ε , and to determine for which asymptotic height functions h R the balls B(R n , h R , ε) are largest. We use the following definitions, well-known from statistical physics, to state the main results: For a set of height functions A ⊂ {h Rn : R n → Z}, the microscopic entropy Ent Rn (A) is given by The macroscopic entropy of an asymptotic height function h R : R → R is given by As an aside, the Z-homomorphism model in two dimensions is equivalent to the six vertex model with uniform weights. Recall that a configuration of the six vertex is an assignment of one of six admissible states to each vertex in a specific subset of the lattice Z 2 , subject to certain local compatibility conditions. Weights w 1 , . . . , w 6 are associated to the six possible vertex states, and at least in the case of finite volume, configurations are sampled in proportion to the product of the weights of the vertices. When the weights are uniform (i.e. w 1 = · · · = w 6 = 1) then the induced measure is uniform over admissible configurations, and the related partition function counts the number of configurations. If one follows the conventions of [RS18], then each configuration of the six vertex model has a unique (up to additive constant) associated height function, defined on the faces of the lattice, such that the heights of two adjacent faces differ by ±1. This height function is a Z-homomorphism defined on the dual lattice. (Note that there is another common convention used to define height functions for the six vertex model, used in [BCG16] among others; a review of the six vertex model is beyond the scope of this article.) A configuration of the six-vertex model is shown in Figure 2, along with its associated height function. Note that the six vertex configuration satisfies the well-studied domain wall boundary conditions, and (therefore) the height function has extremal slope along all four edges of the boundary. As mentioned before, the partition function of the six vertex model counts the number of configurations (because we take uniform weights). Since the configurations are in bijection with their height functions, the six vertex partition function is closely related to the microscopic entropy defined above. As such, the results of this article can be translated to corresponding results for the six vertex model, with uniform weights and appropriately translated boundary conditions. The results of this article imply the variational principle and large deviations principle for the six vertex model with uniform weights and any boundary data. The example above shows the domain wall boundary data, which corresponds to a boundary height function with extremal slope along all four edges of the square.
Now we are prepared to state two of the three main results of this article: the profile theorem and the variational principle. The profile theorem (Theorem 15) holds that, given an asymptotic height function h R , Ent Rn B(R n , h R , ε) ≈ Ent R (h R ) as n → ∞ and ε → 0. (1) The conclusion of the variational principle (Theorem 16) is essentially that, given boundary data h ∂R : ∂R → R, (2) (Note that a few technical corrections are required in (2), since e.g. the asymptotic boundary height function h ∂R is generally not defined at rescaled lattice points 1 n z, z ∈ R n ⊂ Z m ; see Theorem 16 for the more precise statement.) The third main result is the large deviations principle. There is a conventional notation used in the study of large deviations, which we use in the statement of Theorem 17 below. But, for consistency with asymptotic identities (1) and (2), we can restate the large deviations principle as follows: Given a suitable set of asymptotic height functions A ⊂ {h R : R → R}, When the macroscopic entropy has a unique minimizer, the large deviations principle implies that almost all microscopic states must approximate this entropy-minimizing profile. Although the current article does not prove uniqueness of this minimizer, we briefly the electronic journal of combinatorics 27(4) (2020), #P4.1 (a) Without random field.
(c) Unbounded random field. is equivalent to the model studied in the current article, and the others are drawn from a generalized model that the authors believe is amenable to study using the methods described in the current article.
discuss uniqueness in Section 2.5, after the statement of the large deviations principle (Theorem 17). The results describes above are standard within the study of statistical mechanics. Indeed, the intention of this article is neither to prove novel results about the Zhomomorphism model nor to extend well-known results to subtle and technically challenging models (as for example in [MT20]). Rather, the intention of this article is to provide and explain a simplified method of proof that only relies on minimal a priori assumptions. The simple Z-homomorphism model is chosen as the object of study in order to avoid unnecessary technical difficulties that might arise in more complex models.
Our method of proof emerges from distilling the core arguments of [CKP01,She05,MT20]. Compared to those works, our method does not rely on explicit formulas for the local surface tension, strict convexity, concentration inequalities, or the FKG inequality. The robustness of this method is illustrated in the companion article [KMT18]. There, we show the homogenization of the variational principle of graph homomorphisms to Z. In homogenization, homomorphisms are not chosen according to the uniform measure but instead certain heights are preferred or penalized according to a random field. Mathematically the height function is sampled from a Gibbs measure with respect to a randomized Hamiltonian. The limit shape may change drastically; for example, when the random field is unbounded, simulations show the formation of terraces. See Figures 3 and 4 for examples. We hope that the method outlined in this article can serve as a guiding principle for deducing the variational principle and related results for more complex models.
As hinted above, the three main results of this article are the profile theorem (Theorem 15), the variational principle (Theorem 16), and the large deviations principle (Theorem 17). The majority of the effort in this article goes into proving the profile theorem. The proof starts by proving the profile theorem in a special case (where the domain R is the union of simplices and the limiting profile h R is piecewise affine), then bootstraps  this result to the general case. The main idea of each step in this proof is clear, although some care is needed to account for all the details.
We call attention to two ingredients in the proof. The first ingredient is the simplicial Rademacher theorem, so called because it approximates a Lipschitz function h R uniformly over a large portion of its domain by a piecewise affine approximation h K , where the "pieces" on which h K is affine are simplices. This approximation gives control over both the direct error |h K −h R | and over the error in the derivatives |∇h K −∇h R |. Compared to the classical Rademacher theorem which states that h R is almost everywhere differentiable, the simplicial Rademacher theorem is a surprisingly strong approximation result. This seems like a standard result but the authors have not found it stated in this form in the random surfaces literature, so we give details of the proof.
In order to exploit the simplicial Rademacher theorem, we need robustness of both the macroscopic and microscopic entropy, under changes both to the limiting profile and to the domain. Robustness of the macroscopic entropy follows from elementary analysis, because the simplicial Rademacher approximation has derivative ∇h K close to ∇h R . Robustness of the microscopic entropy rests largely upon the second ingredient that we call attention to: a Kirszbraun theorem for graph homomorphisms (see Theorem 19). This theorem gives conditions under which a height function h Rn : R n → Z may be extended to a larger domainR n ⊇ R n . We expect that the main challenge in extending the method of this article to other models will be proving a comparable extension theorem.
The profile theorem can be used to prove the variational principle and the large devi-ations principle. Both proofs are similar, and rely on the local compactness of the space of Lipschitz functions. Moreover the two proofs are robust; once the profile theorem is proven for a model (with suitable macroscopic state space), the variational principle and large deviations principle follow automatically. We present the proof of the variational principle first and in greater detail. For the large deviations principle we highlight the differences in proof, and we also change notation (replacing symbols like Ent n and Ent), in order to match the conventions of large deviations theory.

Overview of remaining article
In Section 2 we define the setting and formulate the main results of this article. In Section 3 we explain the main idea and the structure of the proofs of the main results. The details are then given in Sections 4 through 8.

Notation
• x and y usually denote points in R m .
• z usually denotes a point in Z m .
• For x ∈ R, x denotes the largest integer x, and x denotes the smallest integer x.
• Given a set A in some topological space, A • and A denote the interior and closure of A respectively.
• R and R n are "nice" domains in R m and Z m respectively (see Assumption 2 below).
• h Rn : R n → Z is a height function (i.e. graph homomorphism).
• θ a,b,c (δ) denotes a function with lim δ↓0 θ a,b,c (δ) = 0, with rate of convergence depending only on the parameters a, b, c.
• For a set A we denote with |A| either the cardinality of A or the Lebesgue measure of A.

Setting and main results
In this section we formally describe the model under study, and state the main results that we prove. In describing the model we err on the side of verbosity and explicitness. Some of the notations used are non-standard (such as the θ-notation for asymptotics described in Section 2.4), but these notations allow for relatively concise and (more importantly) precise statements of the results and proofs to follow. In Section 2.1 we will carefully introduce the basic model, i.e. height functions on "nice" subsets of Z m . In Section 2.2 we describe a canonical family of height functions. In Section 2.3 we use these canonical height functions to define the microscopic entropy, then we go on to define the macroscopic entropy and surface tension. In Section 2.4 we introduce our asymptotic notation, as mentioned above. In Section 2.5 the main results of this article are stated.

Objects of study
Given two graphs Γ 1 = (V 1 , E 1 ) and Γ 2 = (V 2 , E 2 ), we recall that a graph homomorphism is a function ϕ : In this article, we only consider to the case of graph homomorphisms from certain subgraphs R n ⊂ Z m to a subgraph of Z. For R n ⊂ Z m , we write R c n for the complement of R n in Z m , and ∂R n := {z ∈ R n | ∃z ∈ R c n , z ∼ z } for the (inner) boundary of R n . The graph Z m is bipartite, and for concreteness and consistency we label the two parts of Z m by parity. Specifically, a vertex z = (z 1 , . . . , z m ) ∈ Z m is even if z 1 +· · ·+z m is even, and z is odd otherwise. Every edge e ∈ E(Z m ) is incident to one even vertex and one odd vertex. Moreover, every graph homomorphism h from a connected domain R n ⊂ Z m to Z either preserves parity at every point (i.e. h(z) has the same parity as z has) or inverts parity at every point. This is relevant when extending a homomorphism from a disconnected domainR n ⊂ Z m to a larger connected domain, since if h preserves parity on one component ofR n and inverts parity on another component, then no extension is possible; cf. the Kirszbraun theorem (Theorem 19).
For simplicity we restrict our attention to parity preserving homomorphisms on connected domains. To extend to general homomorphisms on connected domains, one can use a bijection between the set of parity preserving homomorphisms and the set of parity inverting homomorphisms, e.g. the map h → (z → h(z) + 1). For disconnected domains S ⊂ Z m , the choice of parity on each component of S contributes to the entropy, so the results become more delicate. We will not consider these generalizations further in this article.
Definition 1. A height function on R n is a graph homomorphism h Rn : R n → Z that the electronic journal of combinatorics 27(4) (2020), #P4.1 preserves parity, meaning that for z = (z 1 , . . . , z m ) ∈ Z m , We call a height function h ∂Rn : ∂R n → Z, defined on the boundary of R n , a boundary height function.
We are interested in sequences of subgraphs {R n | n ∈ N} that converge under a scaling limit to a "nice" region R ⊂ R m . More specifically, we make the following assumptions. Assumption 2. We assume that R ⊂ R m is compact and connected and that R is the closure of its interior (sets with the latter property are called regular closed sets; see e.g. [SS95]).
We assume that R n ⊂ Z m and ∂R n ⊂ R are connected as subgraphs of Z m , and (for simplicity) we assume that 1 n R n ⊂ R. We require that 1 n R n → R in the Hausdorff metric; that is, the metric on P(R m ) : (3) Remark 3. By equivalence of norms, it does not matter which norm on R m is used in (3). Later, we will be interested primarily in the 1 norm. This is because the 1 norm is the scaling limit of the graph distance on Z m . More precisely, if x, x ∈ R m and if z n , z n ∈ Z m satisfy | 1 n z n − x| 1 < m n and | 1 For example, when R is compact, convex polytope, such as a hypercube or a simplex, the sets R n := {z ∈ Z m | 1 n z ∈ R} satisfy Assumption 2. Just as the microscopic domains R n have a scaling limit, so do the microscopic height functions h Rn : R n → Z.
Definition 4. We call a function h R : R → R an asymptotic height function if h R is Lipschitz with Lipschitz constant at most 1, with respect to the 1 -norm on R m ; that is, if Likewise, if h ∂R : ∂R → R is 1-Lipschitz (with respect to the 1 -norm), we call h ∂R an asymptotic boundary height function.
We assume that h ∂Rn : ∂R n → Z are boundary height functions that converge (after rescaling) to an asymptotic boundary height function h ∂R : ∂R → R in following sense: for each n, let d n = d H ( 1 n R n , R). Then, we say Now, we define a few families of height functions and asymptotic height functions. The sets of height functions from Definition 5 below appear frequently in entropies, and the sets of asymptotic height functions from Definition 6 are important for the statement of the variational principle (Theorem 16).
Definition 5. Let R n be a microscopic domain as above, let h Rn : R n → Z be a boundary height function, and let δ > 0. We define: In the last definition, the expression "h R ( 1 n z)" makes sense because of the assumption that 1 n R n ⊂ R in Assumption 2. Definition 6. Let R ⊂ R m be a domain satisfying Assumption 2, let h ∂R : ∂R → R be an asymptotic boundary height function, and let δ > 0. We define:

Affine height functions
Affine height functions play an important role in defining and studying the entropy of our model. For an asymptotic height function h R : R → R, we mean by "affine" the usual property: there exist s ∈ [−1, 1] m and b ∈ R such that h R (x) = s · x + b. The bounds on s ensure that h R satisfies the Lipschitz property (4), so all such functions are indeed asymptotic height functions as per Definition 4.
On microscopic domains R n , we consider best-possible approximations to affine functions. Fix s ∈ [−1, 1] m and b ∈ R. At a lattice point z ∈ Z m , we define h s·x+b Rn (z) to be s · z + b, rounded to the nearest integer of correct parity (see Figure 5). In the rest of this subsection, we formalize this definition, verify that it actually does define a height function, and check that it is consistent.
Let us introduce an auxiliary notation that is used only in this subsection. Given a point z = (z 1 , . . . , z m ) ∈ Z m , we say z has even or odd parity as ( m i=1 z i ) ∈ Z has even or odd parity respectively, and we write z mod 2 for the parity of z.
Given z ∈ Z m and y ∈ R, we write [y] z mod 2 for the closest integer to y that has parity z mod 2. In case of a tie, i.e. if y is an integer that has opposite parity to z, we arbitrarily choose to "round up" and set [y] z mod 2 = y + 1 ∈ Z. Note that the symbol x in the superscript of h s·x+b Rn is merely formal; "s · x + b" should be read as "the function mapping x to s · x + b". Moreover, the choice of domain R n in the subscript does not affect the values of h s·x+b Rn at any point; for any sets A n , B n ⊆ Z m and any point z ∈ A n ∩ B n , one has h s·x+b An (z) = h s·x+b Bn (z). An example of a function h s·x+b Rn is provided in Figure 5. From the definition above, it is not clear that h s·x+b Rn are height functions. This is the content of Lemma 7.
Lemma 7. Let s ∈ [−1, 1] m and b ∈ R. For any adjacent points z ∼ z ∈ Z m , the values h s·x+b Rn (z) and h s·x+b Rn (z ) differ by exactly 1. Proof. From the definition of h s·x+b Rn , we note two inequalities: and Additionally, since s ∈ [−1, 1] m , we have the electronic journal of combinatorics 27(4) (2020), #P4.1 By the triangle inequality, |h s·x+b Rn (z) − h s·x+b Rn (z )| 3. We shall show that equality cannot hold. Since the difference h s·x+b Rn (z) − h s·x+b Rn (z ) is obviously an odd integer, it will follow that the difference is ±1.
Suppose towards a contradiction that h s·x+b Rn (z) − h s·x+b Rn (z ) = 3. Then (5) and (6) must be equalities. From the definition of [·] z mod 2 , necessarily then s · z + b is an integer with parity opposite that of z, and so We end this section with the following lemma. The conclusion (8) is exactly what is needed later to apply the Kirszbraun theorem (Theorem 19): then Proof. The proof is similar to that of Lemma 7. By the triangle inequality and (7), We want to prove that the left-hand side of (10) is 0. By parity considerations it must be even, and we need only prove it is = 2. Assume for a contradiction that the left-hand side of (10) equals 2. Then equality holds in (9), and in particular As in the proof of Lemma 7, this implies that This is the desired contradiction, which completes the proof.

Entropies and surface tensions
In this section we make three more definitions needed for our statement of the main results. First, we define the microscopic entropy of a set of height functions. More precisely, this is the Shannon entropy of the uniform distribution over a finite set of height functions, normalized by the size of their common domain, and negated. (The negative convention is chosen so that the surface tension ent(s), defined later, is convex rather than concave.) The microscopic entropy is essentially the same as the specific free energy of [She05].
Definition 9. Given a finite, non-empty set of height functions A ⊂ M (R n ), we define the microscopic entropy We observe that the microscopic entropy is translation invariant: , δ > 0, and c ∈ R. Then: All of these sets, except for M (R n ), are finite, because of the constraints they impose on h Rn and the Lipschitz property of h Rn . In fact, we can say more. Let us count M (R n , h ∂Rn ), for some boundary height function h ∂Rn ∈ M (∂R n ). The values of h R ∈ M (R n , h ∂Rn ) are fixed on ∂R n , and for each of the |R n | points x in the interior of R n , there are at most 2 admissible values for h R (x). Therefore |M (R n , h ∂Rn )| 2 |Rn| . Similar logic holds for M (R n , h ∂ R n , δ) and B(R n , h R , δ). This leads to the following observation: Observation 11. Let h ∂Rn ∈ M (∂R n ), h R ∈ M (R), and δ > 0. Then: Next, we define the local surface tension. There are in general many equivalent definitions of surface tension (see for example [She05,Chapter 6]). The following definition is easiest to work with for our purposes.
Definition 12. For s ∈ [−1, 1] m , the local surface tension ent(s) is defined to be the limit where ent n (s) is defined as and where Q n = [0, n) m ∩ Z m is the discrete hypercube of side length n.
The limit (12) exists by standard subadditivity arguments; we refer the interested reader to e.g. [Dur10]. In fact, by translation invariance (see Observation 10), we may replace h s·x+0 ∂Qn by h s·x+b ∂Qn in (13), for any b ∈ R. Additionally, boundedness passes through the limit in (12). Therefore: Let us now define the macroscopic entropy.

Asymptotic notation
In this section we introduce a notation for asymptotic error. Compared to the Landau big-O notation, our θ-notation abstracts away the rate of convergence of the error, but makes explicit the dependence on parameters. For this purpose we write θ α (δ) for a family of unspecified functions, parameterized by a symbol α, such that θ α (δ) → 0 at a rate depending on the value of the parameter α. That is, for any ε > 0 and any admissible parameter value α, there exists δ 0 = δ 0 (α) > 0 such that 0 < δ < δ 0 implies θ α (δ) < ε.
Extending the above notation, we frequently replace the single parameter α by a list of parameters α, β, γ, . . . . For example, we might write an identity like The identity states that the two entropy terms on the first line differ by a small amount; the difference vanishes as δ and 1 n go to zero, and the rate of convergence depends on several parameters. The "θ(δ)" term depends on the parameters from the setting, namely the ambient dimension m, the region R, the height function h R of interest, and the corresponding discrete objects R n and h Rn . The "θ( 1 n )" term depends on these parameters along with the value of δ. We find that listing out the setting parameters m, R, h R , R n , and h Rn makes the expression harder to read. So for the rest of the article we suppress these parameters from the subscripts of θ terms. Under this convention (14) becomes: As mentioned above, the advantage of our θ notation is that it abstracts away the exact rates of convergence, but leaves explicit the dependencies between parameters. For example, suppose we want to make the error in approximation in (14) to be less than ε. We should first choose δ so that (say) θ(δ) < 1 2 ε, then choose n depending on δ (and on the suppressed parameters m, R, etc.) so that θ δ ( 1 n ) < 1 2 ε.

Main results
The main results of this article are the profile theorem, the variational principle, and the large deviations principle: Theorem 15 (Profile theorem). For any h R ∈ M (R), δ > 0, and n ∈ N, The second main result is the variational principle: Theorem 16 (Variational principle). For any δ > 0 and n ∈ N, Finally, we prove a large deviations principle for the model. We adopt the conventions of large deviations (see for example [DZ09,RAS15]).
Theorem 17 (Large deviations principle). Consider the space M (R) of asymptotic height functions (i.e. Lipschitz functions with Lipschitz constant 1), endowed with the topology of uniform convergence (induced by the supremum norm).
For δ > 0 and n ∈ N, define a probability measure µ δ,n on M (R) by whereh Rn is the Lipschitz function given by rescaling and interpolating h Rn so as to make it an asymptotic height function, i.e. for z ∈ R n ,h Rn ( 1 n z) = 1 n h Rn (z). The measures (µ δ,n ) δ>0,n∈N satisfy a large deviations principle with speed r δ,n := |R n | and tight rate function I : M (R) → [0, ∞] given by where as usual lim and lim denote the limit inferior and superior respectively.
Remark 18. It is straightforward to reduce the double limits in Theorem 17 to (single) sequential limits, which are more common in large deviations theory. For example, one may choose any sequences (δ k ) k∈N , (ε k ) k∈N such that δ k → 0 and ε k → 0 as k → ∞. Then, choose n k large enough that and define µ k := µ δ k ,n k and r k := r δ k ,n k . Then For the study of limit shapes, it is useful to prove two additional results: existence and uniqueness of the minimizer of the rate function from the large deviations principle, i.e. there exists a unique h min Indeed, this holds for the simple model studied in the current article. See for example [She05] for proofs and discussion of these results. Even in more subtle models, the existence of the minimizer is often easy to show: the proof is standard as long as the local surface tension is convex and bounded below. To show uniqueness is harder. Uniqueness of the minimizer may be proved using strict convexity of the local surface tension; see for example [DSS08, Proposition 4.5]. We do not prove these results in the current article, but rather focus on the variational principle and large deviations principle.
Once existence and uniqueness of the minimizer are established (or in the language of the current model, once it is known that the macroscopic entropy functional admits a unique minimizing height function), one can explain the appearance of a limit shape in the following way. The set of asymptotic height functions that lie within distance ε of this minimizer is an open ball in the space M (R). By applying the large deviations principle on the set-theoretical complement, one sees that the percentage of microscopic height functions in M (R n , h ∂Rn , δ) that do not lie ε-close to the minimizer decays exponentially. In other words, with high probability a randomly chosen height function is close to the minimizer, and therefore the minimizer is the limit shape.

Outline and discussion of proof of main results
In this section we briefly outline the proof of the main results and summarize some key ideas. Then we analyze the ingredients in the proof with an eye toward extending the proof to other random surface models.
In Section 4, we provide auxiliary results including basic properties of the local surface tension and microscopic entropy. A central ingredient of the overall argument is discussed in Section 5. There we prove the profile theorem in the special case of piecewise affine asymptotic height functions. In Section 6, we extend the profile theorem to general asymptotic height functions by an approximation argument, yielding the first main result (Theorem 15). In Section 7, we use the profile theorem and a compactness argument to prove the variational principle (Theorem 16). The argument is based on compactness of the space of asymptotic height functions with fixed boundary values M (R, h ∂R ). Finally in Section 8, we extend the proof of the variational principle in order to prove the large deviations principle (Theorem 17).
As one can see from this outline, the main idea of the argument is to reduce the proof of the profile theorem from general domains and asymptotic height functions to simpler domains and asymptotic height functions by an approximation argument. This means that the left-hand side of (15), i.e. the macroscopic entropy, and the right-hand side, i.e. the microscopic entropy, must both be robust with respect to approximations.
The macroscopic entropy is robust because ent(s) is bounded and uniformly continuous, and Lipschitz functions can be approximated very well by linear interpolations on a simplex domain; see the simplicial Rademacher theorem (Lemma 31). This approximation lemma was formulated for two dimensions in [CKP01]. The result is interesting in its own right and for the convenience of the reader we state and prove it for arbitrary dimension in Section 6.
The microscopic entropy is robust under approximations because the microscopic surface tension is very robust: even with fluctuations in the boundary values and the geometry of the boundary, one still gets the same limit in (12). This result is proved in Section 4, using a Kirszbraun theorem for graph homomorphisms stated below. This theorem gives conditions under which a graph homomorphism can be extended from a smaller domain to a larger domain. This is a discrete analogue to the classical result [Kir34], which deals with Lipschitz functions defined on subsets of R d . We also note that more general forms of the Kirszbraun theorem for graph homomorphisms are known, e.g. [CPT18].
Theorem 19 (Kirszbraun theorem for Z m ). Let Λ be a connected region of Z m , let S ⊂ Λ, and let h : S → Z be a graph homomorphism that preserves parity. There exists a graph homomorphism h : Λ → Z such that h = h on S if and only if for all x, y in S, Remark 20. The parity condition is necessary in general; consider for example the function h defined on {0, 2} ⊂ Z by h(0) = 0, h(2) = 1. The parity condition in Theorem 19 is the reason for the parity condition in Definition 1.
Two of the authors gave a proof of a more general version of this theorem in [MT20, Theorem 4.1]. The proof is restated below for the reader's convenience. This proof is also simplified by only addressing the model from this article, where the height functions take values in Z rather than in a d-regular tree.
Proof of Theorem 19. Obviously if an extension h of h exists, then h satisfies (18). So, suppose instead that (18), and let us prove that an extension h exists. For y ∈ Λ, set We must check two things: first that h(y) = h(y) when y ∈ S, and second that |h(y) − h(ỹ)| = 1 when y ∼ỹ are adjacent points in Λ.
To prove that h| S = h, let y ∈ S and consider any point x ∈ S. By the Lipschitz To prove that h is a graph homomorphism, let y ∼ỹ be adjacent points in Λ, and let x,x be points in S that attain the maximum in (19) For every x ∈ S, the map y → h(x) + |x − y| 1 preserves parity (recall the assumption that h preserves parity), and therefore so does h. So h is a parity-preserving map such that |h(y) − h(ỹ)| 1 whenever y andỹ are neighbors. This proves that h is a graph homomorphism. Now, we describe further how to prove the central theorem of this article, i.e. the profile theorem in the special case of piecewise affine height functions. We derive the desired asymptotic equality by showing two inequalities. One direction of the inequality arises by overcounting the number of height functions that are close to the piecewise affine height profile; the opposite direction arises by undercounting the same set. In both directions, we subdivide the region into small blocks, so that we can compare the entropy on each block to the local surface tension (see Definition 12 and Figure 10). To overcount, we consider all choices of boundary values on the boundaries of the blocks, and for each boundary value function we count all possible extensions into the interior of the blocks. To undercount we have to use much smaller blocks, with boundary values fixed to match the desired affine function exactly (after rescaling, and up to rounding). The details of the proof are given in Section 5. The more difficult part of the proof is the overcounting argument, which relies on robustness of the microscopic entropy. We expect this to be a major source of difficulty when adapting our methods to other models.
As one can see, the framework of this argument is quite general and it can be adapted to more complicated models and settings. For example, the model of graph homomorphism into the infinite d-regular tree, studied by some of the current authors in [MT20], is amenable to this approach. Additionally, the authors have applied the current strategy to Z-valued homomorphisms sampled according to a random environment. This means that the underlying combinatorial model is the same as in the current article, but in the definition of the microscopic entropy, the (uniform) counting measure on M (R n , h ∂Rn ) is replaced by a randomly perturbed measure. The conclusion is a homogenized variational principle, meaning that the microscopic entropy Ent n (M (R n , h ∂Rn )), now a random variable depending on the realization of the environment, converges in probability to the minimum of the macroscopic entropy, which is still a deterministic quantity. Furthermore, we hope the method applies to other height function models, such as domino tilings (as studied in e.g. [CKP01]), and perhaps even more general tilings (as in e.g. [She01,Thu90]).

Microscopic entropy and surface tension
In this section, we prove basic properties of the microscopic entropy and local surface tension. More precisely, we prove that ent(s) is continuous (see Lemma 23), that ent n (s) → ent(s) uniformly (see Lemma 24), and that Ent Rn is robust under small changes to boundary values (see Lemma 25).
All three of these proofs split into two cases: values of the slope s that are close to 1 (that is, such that |s| ∞ 1 − ε), where there are comparatively few possible states because of the steep slope; and slopes away from 1 (i.e. |s| ∞ 1 − ε), where we can make arguments based on extending height functions from one domain to another via the Kirszbraun theorem (Theorem 19).
The first result we state is about the microscopic entropy for slopes close to 1. This lemma is used in the remainder of the section to handle the case of s close to 1.
Lemma 21. Let δ > 0, let s ∈ [−1, 1] m with |s| ∞ > 1 − δ, and let n ∈ N. Consider any boundary height function h ∂Qn ∈ M (∂Q n ) such that Then, First, consider the one-dimensional case, i.e. m = 1. Then the problem reduces to a simple calculation. The main idea is that the large slope s forces a height function h Qn ∈ M (Q n , h ∂Qn ) to closely follow a line of slope ±1. By counting the number of deviating edges we overestimate the number of height functions.
Indeed, we assume without loss of generality that R 1 s > 1−δ (the case s < −(1−δ) is symmetric). We want to count height functions h Qn ∈ M (Q n , h ∂Qn ). The line graph x h(x) Figure 6: A one-dimensional height function with slope s > 1 − δ. Because the slope is close to 1, there cannot be many edges along which h(x) decreases.
2n k 2n δn , and the limit is an easy calculation using Stirling's formula.
For higher dimensions, we reduce to the one-dimensional case by treating the hypercube {−n, . . . , n} m as the union of (2n + 1) m−1 independent lines. In so doing we overestimate |M (Q n , h ∂Qn )|, because we relax the graph homomorphism condition between lines. Thus Taking a logarithm and dividing by −|Q n | = −(2n + 1) m yields which completes the proof (and in particular shows why the θ error terms do not depend on the dimension m).
While Lemma 21 deals with slopes s close to 1, a different approach is needed for slopes away from 1. We use Theorem 19, which is a Kirszbraun theorem for graph homomorphisms. It gives a simple criterion for when a height function can be extended to larger domain. Lemma 22 applies the Kirszbraun theorem to derive entropy estimates. In particular, for two box sizes n <n, the lemma compares Ent Qn (M (Q n , h ∂Qn )) and Ent Qn (M (Qn, h ∂Qn )). The key idea is that any height function on the smaller box Q n can be extended to a height function on Qn, respecting the boundary data h ∂Qn . Therefore (up to vanishing error terms), Ent Qn (M (Qn, h ∂Qn )) Ent Qn (M (Q n , h ∂Qn )).
The extension requires that the boundary data h ∂Qn and h ∂Qn be sufficiently similar. In particular, we will assume that both boundary height functions are close to linear height functions, with slopes s andŝ respectively. The parameter ε quantifies how close h ∂Qn and h ∂Qn are to their respective linear height functions.
We also require that the slopes s andŝ be close to each other, which is obviously necessary to apply the Kirszbraun theorem in our setting. Finally, we require that the two boxes sizes n andn be not too different. In particular, we taken = (1 + δ)n, where δ is a second approximation parameter. δ also shows up in a few other bounds, and in the conclusion of the lemma as a θ m (δ) error term. This is not the simplest lemma of its kind that we could state, nor is it the most general. We choose to state these conditions because they are sufficient for our applications in this section. Moreover, they are necessary in the sense that simplifying any condition, e.g. by using only a single slope s rather than two slopes, or by using linear boundary height functions without than allowing ε fluctuations, would not suffice for our purposes.
For the reverse inequality, choose s = s (i) ,ŝ = s * , and exchange the role of n andn. Repeating the work above, we deduce the inequality ent n (s) ent(s) + ε, which completes the proof of Lemma 24. Then, Proof. Suppose first that |s| ∞ 1 − ε 1/2 . Then Lemma 21 applies to both h ∂Qn and h s ∂Qn , so entn(s) + θ m (δ) + θ m ( 1 n ). Since δ is determined by ε, we may replace δ by ε in the θ terms above. And as before, Lemma 24 implies that entn(s) → ent(s) and ent n (s) → ent(s) as n → ∞, at a rate depending only on the dimension and on δ (since n,n differ from n by a factor of (1 + δ) ±1 ). Therefore Ent Qn M (Q n , h ∂Qn ) = ent(s) + θ m (ε) + θ m,ε ( 1 n ) as claimed.

Profile theorem for piecewise affine functions
In this section we prove a simpler version of the profile theorem, restricted to the case where the domain R is a finite union of simplices and where the asymptotic height function h R is piecewise affine, that is, affine when restricted to a single simplex. On one hand this case is simple enough that we can prove the profile theorem directly via over-and under-counting arguments (see the proof of Theorem 29 below). On the other hand, this case is sufficiently powerful to approximate general domains and height functions very well (see the proof of Theorem 15 and especially Lemma 31).
We must impose some regularity assumption on the simplices chosen; in particular we need the isoperimetric ratio to be bounded above (that is, the surface area of a simplex must not be too large in comparison to its volume). For simplicity we restrict our attention to certain families of simplices. Now let us introduce a standard notation describing these simplices.
In this exposition we follow [She05]. Given w = (w 1 , . . . , w m ) ∈ R m , we recall from the list of notations that w := ( w 1 , . . . , w m ) ∈ Z m . For a typical point w ∈ R, let σ(w) denote the permutation of {1, . . . , m} which rank-orders the components of w − w . In particular, Since the first largest coordinate in w − w is at index 2, the second largest coordinate is at index 3, and the third largest (i.e. the smallest) is at index 1, we have σ(w) = (2 3 1).
Moreover, any two simplices C(v 1 , σ 1 ) and C(v 2 , σ 2 ) are isometric. That is, there exists a distance-preserving bijection f : R m → R m such that f (C(v 1 , σ 1 )) = C(v 2 , σ 2 ). This ensures that all the simplices C(v, σ) have the same isoperimetric ratio. For our purposes we will also make reference to rescaled simplices.
Definition 27. For > 0, v ∈ Z m , and σ ∈ S m , we write for scaled copy of the simplex C(v, σ), scaled out from the origin.  (1 2)) is the closure of the set of points (x, y) ∈ [0, 1] 2 such that x > y, and C(0, (2 1)) is the closure of the points with y > x. The other simplices {C(v, σ) | v ∈ Z m , σ ∈ S 2 } are translates of these two simplices. As before, we observe that for any > 0, the collection of simplices { C(0, σ) | σ ∈ S m } tiles the hypercube [0, ] m . Therefore again, the family of translated simplices { C(v, σ) | v ∈ Z m , σ ∈ S m } tiles R m . To approximate a general domain R that satisfies Assumption 2, we consider domains which are the union of simplices.
Definition 28. For > 0, a simplex domain of scale is a region K ⊂ R m that is the union of finitely many simplices of scale . We further require that simplex domains be connected, so that a simplex domain K automatically meets the requirements from Assumption 2.
For example, the union of the two simplices in Figure 8 is a simplex domain of scale 1. It is clear that simplex domains can approximate more general domains R ⊂ R m ; we make this observation more precise in Lemma 31 below. Now, let us formulate the main result of this section, the simplicial profile theorem (Theorem 29). It is a special case of the profile theorem for simplex domains and piecewise affine height functions; cf. the general profile theorem (Theorem 15).
q ≈ ε q ≈ ε 1/2 Figure 10: Decomposition of a single simplex into hypercubes at two scales. In both images, the shaded squares are the Q i from the proof of Theorem 29. The smaller squares on the left are used when undercounting the set B(K n , h K , ε ) and the larger squares on the right are used when overcounting this set.
Theorem 29. Let K = ∆ 1 ∪ · · · ∪ ∆ k be a simplex domain of scale , in the sense of Definition 28. Fix a height function h K ∈ M (K) such that each restriction h K | ∆ j , j = 1, . . . , r, is affine. Let ε > 0, let n ∈ N, and let K n := {z ∈ Z m | 1 n z ∈ K}. Then for any slope s ∈ [−1, 1] m , Remark 30. In reading the proof of Theorem 29 for the first time, we encourage the reader to consider only a single simplex ∆ rather than a simplex domain K = ∆ 1 ∪ · · · ∪ ∆ k . The key ideas are more clear when thinking about a single simplex. In particular the simplex is decomposed into hypercubes two times, using hypercubes of a different scale each time.
The two scales of hypercubes are illustrated in Figure 10. One decomposition is used to overestimate the microscopic entropy by undercounting the set B(K n , h K , ε ). The other is used to underestimate the entropy by overcounting the set.
In the more general case of a simplex domain we still decompose twice, using hypercubes of a different size each time. A typical decomposition is illustrated in Figure 11. In particular we keep only those hypercubes that lie inside a single simplex, so that h K has a single, well-defined slope on each Q i . Both sides of (22) are approximately additive over the simplices, but we will not explicitly prove this result here, nor do we rely on it.
Proof. As mentioned in Remark 30, we subdivide the region K ⊂ R m into hypercubes Q 1 , Q 2 , . . . , Q r of equal side length q. Two different values for the side length parameter q are used at different times. The cubes Q i lie in a grid with their corners on the rescaled lattice qZ m ⊂ R m . The set {Q 1 , . . . , Q r } enumerates all such hypercubes that lie inside exactly one of the simplices ∆ 1 , . . . , ∆ k , as illustrated in Figure 11. This ensures that h K has constant derivative on Q i , which makes later arguments simpler. For i = 1, . . . , r we choose s i ∈ [−1, 1] m and b i ∈ R so that h K | Q,i = h s i ·x+b i Q i . Specifically, this means that the electronic journal of combinatorics 27(4) (2020), #P4.1 Figure 11: In Theorem 29, K may be a simplex domain rather than a single simplex. Then only hypercubes that lie inside one of the simplices are part of the collection {Q i }. These hypercubes are shaded. Note that there are still two different scales of hypercubes used, as illustrated in Figure 10, but only one scale is shown above. The set G n from later in the proof is the set of grid lines contained inside the simplex domain, and U n is the unshaded part of the simplex domain. S n is the union of G n and U n .
for an arbitrarily chosen sample point x i from the interior of Q i . The hypercubes Q i induce a decomposition of the discrete set K n into subsets Q i,n := {z ∈ Z m | 1 n z ∈ Q i }, plus a negligible remainder K n \ r i=1 Q i,n . This remainder is the unshaded part inside the triangles in Figure 11. We write qn for the side length of the discrete hypercubes Q i,n . Technically, each Q i,n has an integer side length q i,n ∈ Z that is equal to either qn or qn , but for simplicity we elide this detail in the rest of the proof.
Let us first sketch the main idea of the proof. We start with the integral on the right-hand side of (22). Since h K is piecewise affine, the integral reduces to a finite sum where we recall that s i = ∇h K (x i ) for x i ∈ Q i . Both of the two values for the hypercube side length parameter q are chosen so that θ m ( q ) = θ m (ε). The θ m ( q ) errors arise from the uncovered region K \ r i=1 Q i , i.e. the unshaded parts of the simplex domain in Figure 11. Indeed, one simply compares the measure |∆ j | = 1 m! m against that of the smaller simplex ∆ j with sides moved √ mq units inwards. Any hypercube Q i that intersects ∆ j must lie inside ∆ j . The θ m ( q ) error bound follows. Now, we turn to the left-hand side of (22). Our goal is to relate Ent Kn (B(K n , h K , ε )) to the sum at the right-hand side of (23). Towards this end, we will under-and over-count the set of height functions B(K n , h K , ε ), in order to derive the over-and under-estimates and Equations (23), (24), and (25), together with the observation made above that θ m ( q ) = θ m (ε), suffice to prove the theorem. In order to prove (24), we will undercount height functions in B(K n , h K , ε ). We choose q ≈ ε and consider only height functions that agree with h Kn on each boundary ∂Q i,n . These boundary data, together with the small size of Q i,n , ensure that h Kn satisfies the ∞ condition for membership in B(K n , h K , ε ). Then, to prove (25), we overcount height functions. We choose a larger value q ≈ ε 1/2 . Any height function h n ∈ B(K n , h K , ε ), when restricted to one of the boundary sets ∂Q i,n and rescaled appropriately, fluctuates away from h K by at most ε n = ε 1/2 qn. When ε is small, this allows us to compare the entropy on Now, let us describe the undercounting argument in detail. We seek to derive (24), an overestimate of Ent Kn (B(K n , h K , ε )), by undercounting the set B(K n , h K , ε ). We take the side length of the hypercubes {Q i } to be q = 1 4 ε . We define an injection from the Cartesian product i M (Q i,n , h s i ·x+b i ∂Q i,n ) into the ball B(K n , h K , ε ) in the natural way: given a tuple of height functions . It follows from Lemma 8 that this function h Kn is a height function, Let us check that h Kn ∈ B(K n , h K , ε ). For z ∈ K n \ r i=1 Q i,n , the estimate | 1 n h Kn (z)− h K ( 1 n z)| ε is immediate from the definition of h Kn (z). So, suppose that z ∈ Q i,n for some i = 1, . . . , r, and let z ∈ Q i,n be a boundary point in ∂Q i,n that minimizes 1 distance from z. In particular, |z − z | 1 qn, so at least for n large enough that 1 n q = 1 4 ε . Therefore h Kn ∈ B(K n , h K , ε ) as desired. Thus Now, Ent Q i,n M (Q i,n , h s i ·x+b i ∂Q i,n ) = ent qn (s i ) by translation invariance (see Observation 10). Moreover, because qn = 1 4 ε n, we have ent qn (s i ) → ent(s i ) as n → ∞ at a rate dependent on ε ; by Lemma 24, the convergence is uniform with respect to s i . In other words, ent qn (s i ) = ent(s i ) + θ m,ε, ( 1 n ). Therefore, recalling (26), we have Now, the difference between r i=1 |Q i,n | |Kn| and 1 r is θ m (ε) + θ m,ε, ( 1 n ), where the first error term accounts for the unshaded part of Figure 11, and the second term is due to discretization effects. Therefore (27) simplifies to which is exactly the overestimate (24). Now, we turn to (25). We will overcount B(K n , h K , ε ) in order to underestimate the entropy Ent Kn (B(K n , h K , ε )). We will take side length q of the hypercubes Q i to be q = ε 1/2 for this part of the argument.
The basic idea is the following: we choose a subset S n ⊂ K n , and we only enforce the condition that | 1 n h Kn (x) − h K ( 1 n x)| < ε from Definition 5 on S n rather than on all of K n . S n is the complement of the (interiors of the) grid cells Q i,n , so for any fixed height values on S n , we can count the number of all extensions into the grid cells using a sum of entropy over the cells. There are many possible height values on S n that satisfy the ε error condition, but ultimately not too many because S n is small (compared to K n ).
Let us provide more detail. We define S n as follows. Let G n denote the grid formed by the boundaries of Q i,n , i.e. the part of the grid lines from Figure 11 that lies inside the simplex domain. Let U n denote the points in K n that lie outside of any hypercube Q i,n , i.e. the unshaded part of the simplex domain in Figure (11). Let S n := G n ∪ U n . (As claimed, the complement K n \ S n is the interior of the grid cells Q i,n .) Additionally, let Adm(S n ) denote the set Adm(S n ) := "admissible" height functions on S n , where "admissible" means those height functions h Sn ∈ M (S n ) that admit an extension to a height function in B(K n , h K , ε ).
We claim that there is an injection from B(K n , h K , ε ) into where " " denotes the disjoint union (so for distinct height functions h Sn andh Sn in Adm(S n ), the product sets i M (Q i,n , h Sn | ∂Q i,n ) and i M (Q i,n ,h Sn | ∂Q i,n ) are considered disjoint inside the set from (29)). Indeed, for any h Kn ∈ B(K n , h K , ε ), the function h Sn := h Kn | Sn is by definition in Adm(S n ), and (h Kn | Q i,n ) r i=1 lies in the Cartesian product from the right-hand side of (29). To see that this map is an injection, suppose that h Kn andh Kn map to the same point. Then by definition of the (purported) injection, h Kn | Q i,n =h Kn | Q i,n for each hypercube Q i,n . Additionally, since the right-hand side of (29) is a disjoint union, we have h Kn | Sn =h Kn | Sn . Since K n = i Q i,n ∪ S n , the two height functions h Kn andh Kn are identical. Therefore, the map is an injection, and so Taking logarithms and multiplying by − 1 |Kn| , we see that Now, we are almost done. We use three more asymptotic identities in the right-hand side of (30) to derive (25). First and simplest, since |Q i,n | |Kn| = 1 r + θ m (ε) + θ m,ε, ( 1 n ) and Ent Q i,n is bounded (Observation 11), we replace |Q i,n | |Kn| by 1 r in (30). Second, we apply Lemma 25 to replace Ent Q i,n (M (Q i,n , h Sn | ∂Q i,n )) by ent(s i ). We fix a height function h Sn that achieves the minimum, then apply the lemma on each of the hypercubes Q i,n . We recall that the hypercubes have side length qn = ε 1/2 n. Since h Sn is admissible and since

So, Lemma 25 applies and yields
where the last line is just a matter of hiding the functions like ε 1/2 inside our θ-notation.
(See Section 2.4 for the definition of θ-notation.) Finally, we claim that 1 |Kn| log | Adm(S n )| = θ m (ε) + θ m,ε, ( 1 n ), where we recall that the set Adm(S n ) was defined in (28). To see this, fix a base point z 0 ∈ S n . There are at most (2εn + 1) choices for h Sn (z 0 ), by definition of B(K n , h K , ε ). Then, since S n ⊂ Z m is connected (in the sense of graph theory), there are less than 2 |Sn| ways to extend h Sn to the rest of S n . So, we must estimate |S n |. We recall that S n = G n ∪ U n , where G n is the grid and U n the unshaded region in Figure 11. Since G n grows like n m−1 while |K n | grows like n m , we have |Gn| |Kn| = θ m,ε, ( 1 n ). Next, the part of K ⊂ R m that lies outside of any hypercube Q i , that is, the unshaded part of the simplex domain in Figure 11, is a θ m (ε) fraction of the total volume of K. Even with discretization errors, |Un| |Kn| = θ m (ε)+θ m,ε, ( 1 n ). Altogether, Applying the three asymptotic identities above in (30), we derive which is exactly (25). This completes the proof.

Proof of the profile theorem
In this section we extend Theorem 29, the profile theorem for piecewise affine height functions on simplex domains, to general asymptotic height theorem on general domains (subject to Assumption 2). The proof is an approximation argument, and we will need some auxiliary results. The most helpful is the simplicial Rademacher theorem, which states that Lipschitz functions are well-approximated by piecewise affine functions on a simplex domain. The other auxiliary results are about robustness of the microscopic and macroscopic entropies under changes in the domain and in the asymptotic height profile.
The simplicial Rademacher theorem is a general fact about Lipschitz functions. There is nothing particular to our setting, except for the use of our term "asymptotic height function" instead of "Lipschitz function." Related results include [Sch14], which extends Lemma 31 from Lipschitz functions to Sobolev functions, but weakens the approximation somewhat and is therefore not suitable for our purposes here. The statement of the simplicial Rademacher theorem is adapted from [CKP01, Lemma 2.2], and the proof is inspired by the proof there.
Lemma 31 (Simplicial Rademacher theorem). Let R ⊆ R m be a region satisfying Assumption 2, and let h R ∈ M (R, h ∂R ) be an asymptotic height function on R. For any ε > 0 and any > 0 sufficiently small (depending on ε), we may choose a simplex domain K = ∆ 1 ∪ · · · ∪ ∆ k ⊆ R of scale (see Definition 28) and a piecewise affine asymptotic height function h K : K → R (that is, an asymptotic height function such that each restriction h K | ∆ i : ∆ i → R is affine) that satisfy the following properties: (a) |R \ K| < ε, where | · | denotes the Lebesgue measure, and d H (K, R) < ε), where d H denotes Hausdorff metric; (b) max x∈K |h K (x) − h R (x)| < 1 2 ε ; and (c) on at least a (1 − ε) fraction of the points in K (by Lebesgue measure), the gradients ∇h K (x) and ∇h R (x) agree to within ε; more precisely, 1 Remark 32. We recall that the Rademacher theorem states that a Lipschitz function h R is differentiable almost everywhere. However ∇h R may be poorly behaved. The Rademacher theorem gives no control over ∇h R , and the Lipschitz property only implies boundedness of the derivative, not regularity. The simplicial Rademacher theorem provides an approximation both to h R and to its derivative. Moreover the approximating function h K has a very simple derivative, despite the potential wildness of ∇h R . The cost is that h K only approximates h R well on a (large) portion of the domain rather than almost everywhere, but for our purposes this is a good trade-off.
In fact, it is not necessary that the function h R be Lipschitz. Almost everywhere differentiability is sufficient.
Before giving the proof of Lemma 31, we state and prove the following lemma about the standard simplices from Definition 26.
Lemma 33. Let ∆ be any of the simplices C(v, σ) for v ∈ Z m and σ ∈ S m . The m + 1 vertices of ∆ can be labelled x (0) , . . . , x (m) in such a way that, for each i = 1, . . . , m, where for 1 j m, e (j) denotes the j-th standard basis vector (i.e., all entries of e (j) are 0, except the j-th entry, which is 1).
Remark 34. We encourage the reader to keep Figure 9 in mind (or better, in sight) while reading this proof.
Proof. For simplicity, we assume without loss of generality that v = 0. We use the permutation σ to define a path between vertices of the simplex C(0, σ) starting at (0, . . . , 0) and ending at (1, . . . , 1). To construct the path, first observe that the electronic journal of combinatorics 27(4) (2020), #P4.1 In other words, the σ(1)-th component of x must be greater than the σ(2)-th, which is greater than or equal to the σ(3)-th, and so on. The path travels from (0, . . . , 0) along the σ(1)-th axis to e σ(1) , then parallel to the σ(2)-th axis to e σ(1) + e σ(2) , and so on up to m i=1 e i = (1, . . . , 1). Numbering the vertices of the path from x (0) to x (m) proves the lemma. Now, we are ready for the proof of Lemma 31.
Proof of Lemma 31. Let > 0. We choose the simplex domain K = ∆ 1 ∪ · · · ∪ ∆ k such that {∆ 1 , . . . , ∆ k } enumerates all simplices of scale (cf. Definition 27) that are contained in R. We define the asymptotic height function h K ∈ M (K) to agree with h R on the vertices of the simplices in K, and we extend h K into the rest of each ∆ i by linear interpolation. We will show that, once is small enough, properties (a), (b), and (c) from Lemma 31 all hold.
First we prove (a). The fact that |K| tends to |R| as → 0 is elementary measure theory, and we omit the proof.
Recall from (3) that we define the Hausdorff metric d H in terms of the 1 metric on R m , for reasons explained in Remark 3. Therefore, for the second part of (a), it suffices to show that R ⊂ K + B 1 (0, ε) := x + y x ∈ K, |y| 1 < ε .
We do this by constructing a subset R ⊂ R such that R ⊂ R + B 1 (0, ε) and R + B 1 (0, α) ⊂ R for some α < ε. The latter condition ensures that, for small enough, R ⊂ K. Indeed, the 1 -diameter of a simplex of scale is m (i.e. diam 1 ∆ := max{|x − y| 1 | x, y ∈ ∆} m ), so as long as < α m , every point x ∈ R belongs to a simplex ∆ i of scale which is part of K. Therefore R ⊂ K, so R ⊂ K + B 1 (0, ε) as intended.
For later use we strengthen the volume estimate from (a). Choose smaller so that In particular, this implies that the electronic journal of combinatorics 27(4) (2020), #P4.1 Let us describe the key idea used to prove (b) and (c). We consider points x where h R is differentiable, and indeed where h R is locally approximated well by its first-order Taylor polynomial. Once the simplices ∆ i are small enough and contain a "good" point x, the vertices all lie close to x, so we can use the Taylor polynomial to estimate the values of h R on the vertices. This yields the proof of (b) and (c).
To be more precise, we define a set S ρ 0 of "good" points. Recall that the Lipschitz function h R is almost everywhere differentiable, by the Rademacher theorem. Consider any point x ∈ R at which ∇h R exists. Define the Taylor polynomial By the definition of differentiability, so there exists r 0 (x) > 0 such that, for any y ∈ R with |y − x| 2 < r 0 (x), (Recall that m ∈ N is the dimension parameter; we could replace the parenthesized expression by ε 4m , but the expressions ε 4 √ m and ε 2m are useful later.) For ρ > 0, define the set S ρ ⊂ R by S ρ := x ∈ R r 0 (x) ρ .
As ρ → 0, the sets S ρ increase to the full-measure subset of R on which h R is differentiable. Therefore |S ρ | → |R| as ρ → 0, and in particular, there exists ρ 0 > 0 such that We choose 0 ρ 0 √ m . By the Pythagorean theorem (in m dimensions), if x, y are two points that lie in a simplex ∆ i and if x ∈ S ρ 0 , then |x−y| 2 √ m < ρ 0 r(x). Therefore by (33), There are two more steps to prove (b). First, under the assumption that x ∈ ∆ i ∩ S ρ 0 , we have compared h R | ∆ i to the Taylor polynomial of h R centered at x; we should also compare h K to the same polynomial. Second, we show that at least (1 − ε)k of the simplices have some intersection with S ρ 0 . Then it is straightforward to complete the proof of (b).
Regarding h K , recall that on the vertices y 0 , . . . , y m of ∆ i , h K agrees with h R . Therefore by (35), where we recall that L x (y) = h R (x) + ∇h R (x) · (y − x) is the first-order Taylor polynomial of h R at x. Combining these two inequalities, Since Because h K is the linear interpolation of h R from the vertices y 0 , . . . , y m to the rest of ∆ i , we see that the first term on the left-hand side of (37) is And of course, (y i − y i−1 )/|y i − y i−1 | 2 = e σ(i) , so the second term on the left-hand side of (37) is The last three equations hold for all i = 1, . . . , m. Therefore we may drop the permutation σ(i) from the partial derivatives and conclude that, for every i, Thus The next three lemmas regard the robustness of the macroscopic entropy and microscopic entropy to changes in the domain and to changes in the limiting asymptotic height function. As seen in the simplicial Rademacher theorem (Lemma 31), we will change both the domain and the asymptotic height function. As long as these changes are small enough (in the appropriate senses), these lemmas show that the macroscopic entropy and microscopic entropy change by a small amount.
First, we deal with robustness of the macroscopic entropy. Because Ent R is an integral function with continuous and bounded integrand, robustness with respect to changes in both domain and asymptotic height function is easy to prove by standard analytic arguments. The main requirement is control over the change in the derivative of the asymptotic height function, as is provided by (c) from the simplicial Rademacher theorem (see Lemma 31).
The set {x ∈R | |∇hR(x) − ∇h R (x)| 2 ε} has measure less than ε by hypothesis. Since ent(s) is bounded (see Observation 13), the contribution of the points in this set to EntR(hR) is within θ(ε) of the contribution to Ent R (h R ).
Likewise, the set R \R has measure at most ε, so the contribution to Ent R (h R ) is θ(ε). Of course, this set does not contribute to EntR(hR).
Finally, for the remaining points x, |∇hR(x) − ∇h R (x)| 2 < ε. Since ent(s) is uniformly continuous on its domain s ∈ [−1, 1] m , we have | ent(∇hR(x)) − ent(∇h R (x))| < θ m (ε). Since the integrands differ by at most θ m (ε) and since the integrals are normalized by 1 |R| and 1 |R| , the contribution from this third part of the domain is also θ m (ε). Now, we turn to the microscopic entropy. Here it is easier to record two separate robustness results. The first is robustness with respect to changes in the asymptotic height function, and the second is robustness with respect to changes in domain. Robustness with respect to changes in the asymptotic height function comes immediately from the definition of the balls B(R n , h R , ε).
Lemma 36. Let ε > 0 and n ∈ N. Let R ⊂ R m satisfy Assumption 2, and let R n ⊂ Z m satisfy 1 n R n ⊂ R. Let h R ,h R ∈ M (R) be two asymptotic height functions such that Proof. It suffices to notice that B(R n , h R , 2ε) ⊇ B(R n ,h R , ε). This follows from the triangle inequality: for any h Rn ∈ B(R n ,h R , ε) and any z ∈ R n , More care is needed to state and prove robustness of the microscopic entropy with respect to changes in domain. The main idea is straightforward. Given two microscopic domainsR n ⊂ R n , we will consider the extension map from B(R n , h R , ε) to B(R n , h R , ε) and the restriction map in the opposite direction. So long as every height function on the smaller domain admits an extension, we have |B(R n , h R , ε)| |B(R n , h R , ε)|. In the opposite direction, the restriction map is not generally an injection but the pre-images are not too large; at most 2 N height functions on R n restrict to the any specific height function onh R , where N = |R n | \ |R n |.
Most of the complications arise in the extension step. Our primary extension result, namely the Kirszbraun theorem (Theorem 19), is insufficient. It states that a height function hR n ∈ B(R n , h R , ε) admits an extension to R n , but that extension is not necessarily in B(R n , h R , ε). There are two ways forward: to prove a stronger extension theorem specialized to the problem under consideration, or to leverage the Lipschitz property to control the extension. For greater generality, we prefer the second method. However there are a few difficulties: the Kirszbraun theorem is subtle when the asymptotic height profile has |∇h K | ∞ in part of the region, and the extension cannot generally be kept within distance ε of h R . This leads to the following somewhat complex formulation.
Let h R ∈ M (R) be an asymptotic height function with Lip(h R ) 1 − cε for some fixed c ∈ (0, 1]. Then, Remark 38. The assumptions in Lemma 37 quantify the imprecise statements thatR, 1 n R n , and 1 n R n respectively approximate R, R, andR from inside. If we take the simplicial approximation K from Lemma 31 to beR, and its discretization K n := {z ∈ Z m | 1 n z ∈ K} to beR n , and if we recall the Assumption 2 about R and R n , then the hypotheses of Lemma 37 are satisfied, and moreover, we may replace all instance of θ(ε) in the conclusion by θ(ε ).
To justify (40) we argue as follows. Consider the continuum region R n := z∈Rn ([0, 1 n ] d + 1 n z), i.e. the union of hypercubes of side length 1 n translated by the points in 1 n R n . Clearly R n has Lebesgue measure equal to |R n |n −d , and (like 1 n R n ) satisfies d H (R n , R) = θ(ε). This implies (40). Equation (41) is analogous. Further arithmetic yields the equation and then |R n \R n | = |R n | θ(ε) + θ ε ( 1 n ) . Therefore the restriction map from B(R n , h R , ε) to B(R n , h R , ε) satisfies the following property: every height functionhR n in its image has at most (2 |Rn| ) θ(ε)+θε( 1 n ) pre-images; i.e. the restriction map is at most (2 |Rn| ) θ(ε)+θε( 1 n ) -to-1. Inequality (38) follows immediately. Now let us turn to (39). We want an injection from B(R n , h R , c 3 ε 2 ) into B(R n , h R , ε). Fix a function hR n ∈ B(R n , h R , c 3 ε 2 ); we will construct an extension h Rn ∈ M (R n ). Let For z ∈ R n , we arrange for the extension to satisfy |h Rn (z) − nh R ( 1 n z)| 1. When nh R ( 1 n z) is not an integer, or is an integer but has the same parity as z, this inequality uniquely determines the value of h Rn (z). In the remaining case, there are two candidate values; we arbitrarily choose to "round down" to the lower value. Later it is important that we consistently round down (or up).
Let us check the hypotheses of the Kirszbraun theorem. Ifz ∈R n and if z ∈ R n , then |z − z| 1 > ε 3 n. Therefore The argument for points z 1 , z 2 ∈ R n is similar to the arguments made in Section 2.2. By the triangle inequality, |h Rn (z 1 ) − h Rn (z 2 )| |z 1 − z 2 | 1 + 2. Equality holds only if both nh R ( 1 n z 1 ) and nh R ( 1 n z 2 ) are integers of the same parity as z 1 and z 2 , respectively. In this case h Rn is rounded down at both points, so the Kirszbraun inequality is still satisfied.
So, there exists an extension h Rn of hR n such that |h Rn (z) − h R ( 1 n z)| 1 for z ∈ R n . We claim that h Rn ∈ B(R n , h R , ε). Since c 3 ε 2 ε and since 1 εn, it suffices to consider points z ∈ R n \ R n . Fix such a z. By the definition of R n , there existsz ∈R n such that |z − z| ε 3 n. Note that c 3 ε 2 ε 3 , since c, ε 1. By the Lipschitz property of h R and h Rn , By symmetry, h Rn (z) nh R ( 1 n z) − εn, and so h Rn ∈ B(R n , h R , ε). This extension process defines an injection from B(R n , h R , c 3 ε 2 ) into B(R n , h R , ε), which proves (39). Finally, we derive the conclusion from (38) and (39) by taking logarithms and normalizing, using (42) to account for the difference in normalizing factors − 1 |Rn| and − 1 |Rn| .
Now, let us prove the profile theorem (Theorem 15). The main idea is straightforward: we approximate h R by a piecewise affine function (given by the simplicial Rademacher theorem, i.e. Lemma 31), for which we have already proven the simplicial profile theorem (Theorem 29). Then we use the robustness results (Lemma 35, Lemma 36, and Lemma 36) to deduce the profile theorem for h R . However, in order to apply Lemma 36 we must first reduce to the case where the Lipschitz constant Lip(h R ) := inf{λ > 0 | h R is λ-Lipschitz} is strictly less than 1.
Proof of Theorem 15. First we reduce to the case where Lip(h R ) 1 − cδ, for a constant c > 0 depending only on the domain R. Then, we reduce to the piecewise affine case of Theorem 29.
We make two calculations. First, Second, So once we prove (43) with the extra hypothesis that Lip(h R ) 1 − cδ, the general result follows.
Reduction to piecewise linear height functions. We will apply Lemma 31 to derive a simplex domain K and a piecewise linear height function h K approximating R and h R , then appeal to Theorem 29. In so doing we introduce two parameters: ε, which controls how well K approximates R, and , which controls the size of the simplices in K.
There are a few important properties of ε and . First, δ = ε , so there is actually only one degree of freedom. Second, must be chosen to be sufficiently small, as is required by the simplicial Rademacher theorem (see Lemma 31). Third, as δ → 0 we must have ε → 0, so that θ(ε) = θ(δ).
Let us describe explicitly how we choose ε and satisfying these constraints. We fix a sequence ε k ↓ 0 (e.g. ε k = 1 k ), and for each k set k := 1 2 sup > 0 Lemma 31 applies with ε = ε k .

Proof of the variational principle
Besides the profile theorem (Theorem 15), the proof of the variational principle (Theorem 16) relies on compactness of the space of asymptotic height functions. For robustness, we give a proof that does not assume that the macroscopic entropy functional admits a minimum. Note that the existence of such a minimizer is standard as soon as the local surface tension is convex and bounded below; see for example [CKP01, Section 2] or [She05].
However, for greater generality we work with the infimum of the macroscopic entropy and we do not assume that a minimizer exists. At any rate, it will be necessary to deal with infima (rather than minima) later when proving the large deviations principle.
Proof of Theorem 16. First, we shall prove that via undercounting the number of height functions in M (R n , h ∂Rn , δ). The strategy is simple: we only count those height functions that are close to a "near-minimizer" of the macroscopic entropy. If we assume that a minimizer exists, i.e. that there exists h min R ∈ M (R, h ∂R ) such that then the following proof suffices. For any δ > 0 and n ∈ N, the Definition 5 implies that It follows immediately that so after applying the profile theorem and replacing 2δ by δ, However, as mentioned above we want to give a proof that does not rely on the existence of a minimizer. This idea is also important for proving the large deviation principle below (see the paragraphs following (58) below). The first step is to replace h min R by a sequence of approximations, say h Also let θ (k) (δ) and θ (k) δ ( 1 n ) denote the θ terms from the profile theorem (Theorem 15) for the height function h (k) R . At this point one may be tempted to simply take the limit k → ∞ for fixed δ and n. The problem is that the sequence θ (k) (δ) is not necessarily controlled as k goes to infinity, and could in general diverge for any fixed δ > 0, and likewise for θ To correct this, we proceed as follows. Let δ 0 = +∞ and n 0 = 0. For k 1, choose δ k such that 0 < δ k 1 2 δ k−1 and θ (k) (δ k ) 1 k .
By the profile theorem (Theorem 15) applied to h By choice of δ k , Finally, since k and δ k are determined from δ, Putting it all together, we have Now we prove the reverse inequality, namely Let ε > 0. For each h R ∈ M (R, h ∂R , 2δ), by the profile theorem (Theorem 15) For each h R ∈ M (R, h ∂R ), fix η(h R ) > 0 such that the θ(δ) term in (51) satisfies Recall from Definition 6 that the electronic journal of combinatorics 27(4) (2020), #P4.1 This set is compact as an easy consequence of the Arzelà-Ascoli theorem. Choose h (1) R . Note that the number k of sets in this cover depends only on δ. We abbreviate η i := η(h (i) R ). Moreover, we fix n i ∈ N such that for all n n i , the θ δ ( 1 n ) from (51) satisfies The covering M (R, h ∂R , 2δ) ⊆ k i=1 B(R, h This means that for any discrete height function h Rn ∈ M (R n , h ∂Rn , δ) with continuous (rescaled) interpolationh Rn ∈ M (R, h ∂R , 2δ) (note that δ increases to 2δ from discretization errors), there is i ∈ {1, . . . , k} such that sup x∈R |h Rn (x) − h Hence, Let us estimate |B(R n , h (i) R , η i )|. Assuming that n is larger than all of the constants n 1 , . . . , n k , then for each i = 1, . . . , k, (By (51), (52), and (53)) inf In other words, R , η 1 ) e 5ε|Rn| . We apply this last estimate in (55) to derive Ent Rn M (R n , h ∂Rn , δ) − 1 |R n | log k B(R n , h (1) Here, note that since k depends only on δ, k |Rn| = θ δ ( 1 n ). Because ε > 0 was arbitrary, this yields the desired estimate (50).

Large deviations principle
In this section we prove Theorem 17, the large deviations principle. For the reader's convenience, we recall the following definitions from the statement of the theorem in Section 2.5. For δ > 0, n ∈ N, and h R ∈ M (R): µ δ,n := 1 |M (R n , h ∂Rn , δ)| h Rn ∈ M (R n , h ∂Rn , δ) h Rn ∈ A , r δ,n := |R n |, whereh Rn is the piecewise-affine interpolation of the function 1 n z → 1 n h Rn (z) on the simplex domain with vertices { 1 n K} n∈Rn , and where E = inf h R ∈M (R,h ∂R ) Ent R (h R ). The proof of the large deviations principle that we give here is based on the proof of the variational principle, Theorem 16, given in Section 7. We encourage the reader to read Section 7 first.
Proof of Theorem 17. First, we prove the LDP lower bound (16), i.e. Without loss of generality we may assume that A is open. We may assume also that inf h R ∈A I(h R ) < ∞, or else (16) is trivial. By using these assumptions and replacing the symbols µ δ,n , r δ,n , and I(h R ) by their definitions, (16) simplifies to After cancelling the corresponding terms in (57), and after replacing lim by our preferred θ asymptotics, it suffices to show that Note the analogy between (58) and inequality (49) from the proof of the variational principle. Indeed, we prove (58) in a similar manner to (49). We fix a sequence of asymptotic height function h (k) R ∈ A that saturates the infimum; for concreteness, choose h Let θ (k) (δ) and θ (c) θ (k) (δ k ) 1 k . Given δ > 0, choose k such that δ ∈ (δ k , δ k−1 ]. We claim that for large enough n (depending on A and δ), Indeed, if h Rn ∈ B R n , h (k) R , δ k , then for z ∈ ∂R n and x ∈ ∂R, By choosing x close to 1 n z (which is possible, since 1 n R n converges to R), the second error term tends to 0. So does the third, because 1 n h ∂Rn converges to h ∂R by hypothesis. Since δ k < δ, this proves the inclusion B(R n , h (k) R , δ k ) ⊆ M (R n , h ∂Rn , δ). It remains to show the electronic journal of combinatorics 27(4) (2020), #P4.1 that each h Rn ∈ B(R n , h (k) R , δ k ) satisfiesh Rn ∈ A. But by criterion (b) above, at every lattice point z ∈ R n , R − h R for each z ∈ R n , and sinceh Rn is extended from these points to R by piecewise affine interpolation, the same inequality holds for all x ∈ R. In particular, this implies thath Rn ∈ A c and the inclusion (59) holds as claimed.
Using this inclusion together with the profile theorem for h (k) R , we derive the inequalities By criterion (c), we have θ (k) (δ k ) 1 k . Since we chose k such that δ ∈ (δ k , δ k−1 ], we have k → ∞ and δ → 0, so 1 k = θ A (δ). (The dependence on A comes from the choice of k We observe that (µ δ,n ) δ,n is exponentially tight, i.e. that for every b ∈ (0, ∞), there exists K b ⊂ M (R) such that lim δ→0 lim n→∞ 1 r δ,n log µ δ,n (K c b ) −b.
Indeed, we may take K b to be the closure of M (h R , h ∂R , 1), independent of b. any h Rn ∈ M (R n , h ∂R , δ) satisfiesh Rn ∈ M (h R , h ∂R , 1) by the triangle inequality, so µ δ,n (K c b ) = 0. By the general theory of large deviations, exponential tightness implies that it is sufficient prove the upper bound (17) for compact sets A ⊂ M (R).
If inf h R ∈A I(h R ) = ∞, then every height function in A differs from h ∂R at some point on the boundary. In fact by compactness, there exists δ 0 such that for every h R ∈ A, sup x∈∂R |h ∂R (x) − h R (x)| δ 0 . Clearly, as in the proof of exponential tightness above, this implies that {h Rn ∈ M (R n , h ∂Rn , δ) |h Rn ∈ A} is empty once δ is small enough and n large enough. For all such δ, n we have µ δ,n (A) = 0 and (17) follows.
It remains to prove the upper bound (17) when inf h R ∈A I(h R ) < ∞ and A is compact. Just like for the lower bound before, we reduce to proving the following inequality: inf h R ∈A Ent R (h R ) Ent Rn h Rn ∈ M (R n , h ∂Rn , δ) h Rn ∈ A + θ A (δ) + θ A,δ ( 1 n ). (60) We will closely follow the proof of (50) from the proof of Theorem 16 in Section 7. Let ε > 0, and choose h (1) As in (54), choose h where η 1 , . . . , η k are chosen so that for each i, the θ(δ) term from the profile theorem for h (i) R satisfies θ(η i ) ε. Exactly as in the proof of Theorem 16 (see (55)), From this we deduce the analogue of (56), namely Ent Rn h Rn ∈ M (R n , h ∂Rn , δ) h Rn ∈ A This completes the proof of (60) and of the large deviations principle.