Szemer\'edi's regularity lemma via martingales

We prove a variant of the abstract probabilistic version of Szemer\'edi's regularity lemma, due to Tao, which applies to a number of structures (including graphs, hypergraphs, hypercubes, graphons, and many more) and works for random variables in $L_p$ for any $p>1$. Our approach is based on martingale difference sequences.


Introduction
1.1. The aim of the present paper is to prove a variant of the abstract probabilistic version of Szemerédi's regularity lemma, due to Tao [26,27,28]. This variant applies to a number of combinatorial structures -including graphs, hypergraphs, hypercubes, graphons, and many more -and works for random variables in L p for any p > 1. A proper exposition of our main result requires some preparatory work and hence, at this point, we will not discuss it in detail. Instead, we will focus on the following model case which is representative of the contents of this paper.

1.2.
A very basic fact of probability theory is that the set of simple functions is dense in L 1 . Actually, this fact is so basic that it is hardly mentioned when applied. But how do we approximate a given random variable by a simple function? More precisely, given an integrable random variable f : [0, 1] → R and a real 0 < ε 1 (that we regard as an error) we are asking for an effective method to locate a simple function s : [0, 1] → R such that f − s L1 ε.
It turns out that there is a natural greedy algorithm for this problem which we are about to describe. We start by setting F 0 = ∅, [0, 1] and f 0 = E(f | F 0 ). That is, F 0 is the trivial σ-algebra on [0, 1] and f 0 is the conditional expectation of f with respect to F 0 (see, e.g., [2]). The σ-algebra F 0 is finite, and so f 0 is a simple function. Thus, if f − f 0 L1 ε, then we are done. Otherwise, by considering the support of the positive part or the negative part of f − f 0 , we may select a measurable subset A 0 of [0, 1] such that Next we set F 1 = σ(F 0 ∪ {A 0 }) and f 1 = E(f | F 1 ). (That is, F 1 is the smallest σ-algebra on [0, 1] that contains all elements of F 0 and A 0 , and f 1 is the conditional expectation of f with respect to F 1 .) Observe that, by (1.1), we have Also notice that f 1 is a simple function, and so if f − f 1 L1 ε, then we can stop this process. On the other hand, if f − f 1 L1 > ε, then we select a measurable subset A 1 of [0, 1] such that | A1 (f − f 1 ) dt| > ε/2 and we continue similarly. The next thing that one is led to analyze is whether this algorithm will eventually terminate and, if yes, at what speed. To this end notice that if the algorithm runs for ever, then it produces an increasing sequence (F i ) of finite σ-algebras of [0, 1] and a sequence (f i ) of random variables with f i = E(f | F i ) for every i ∈ N and such that f i − f i−1 L1 > ε/2 if i 1. In other words, (f i ) is a martingale adapted to the filtration (F i ) whose successive differences are bounded away from zero in the L 1 norm. This last piece of information is the key observation of this analysis since successive differences of martingales, known as martingale difference sequences, are highly structured sequences of random variables. In particular, if the given random variable f belongs to L p for some 1 < p 2, then for every integer n 1 we have This functional analytic estimate follows by combining some deep results from probability theory and Banach space theory which we discuss in Appendix A. Here we simply mention that the main ingredients of its proof are the work of Burkholder on the unconditionality of martingale difference sequences and the fact that the Banach space L p has cotype 2 for any 1 p 2. Of course, with inequality (1.3) at our disposal, it is very easy to analyze the greedy algorithm described above. Precisely, by (1.3) and the monotonicity of the L p norms, we see that if f ∈ L p for some 1 < p 2, then this algorithm will terminate after at most ⌊64 f 2 Lp ε −2 (p − 1) −2 ⌋ + 1 iterations.

1.3.
Our main result (Theorem 3.1 in Section 3) follows the method outlined above but with two important extra features. First, our approximation scheme is more demanding in the sense that the simple function we wish to locate, is required to be a linear combination of characteristic functions of sets belonging in a given class. It is useful to view the sets in this class as being "structured", though for the purpose of performing the greedy algorithm only some (not particularly restrictive) stability properties are needed. These properties are presented in Definition 2.1 in Section 2, together with several related examples.
Second, the error term of the approximation is controlled not only by the L p norm but also by a certain "uniformity norm" which depends on the class of "structured" sets with which we are dealing (see Definition 2.2 in Section 2). This particular feature is already present in Tao's work and can be traced in [13].
Finally, we note that in Section 4 we discuss some applications, including a regularity lemma for hypercubes and an extension of the strong regularity lemma to L p graphons for any p > 1. These applications are definitely not exhaustive, but are quite representative of the scope and flexibility of our main result.
1.4. By N = {0, 1, 2, . . . } we denote the set of natural numbers. As usual, for every positive integer n we set [n] := {1, . . . , n}. For every function f : N → N and every ℓ ∈ N by f (ℓ) : N → N we shall denote the ℓ-th iteration of f defined recursively by f (0) (n) = n and f (ℓ+1) (n) = f f (ℓ) (n) for every n ∈ N. All other pieces of notation we use are standard.

Semirings and their uniformity norms
We begin by introducing the following slight strengthening of the classical concept of a semiring of sets (see also [3]).
Definition 2.1. Let Ω be a nonempty set and k a positive integer. Also let S be a collection of subsets of Ω. We say that S is a k-semiring on Ω if the following properties are satisfied.
As we have already indicated in the introduction, we view every element of a k-semiring S as a "structured" set and a linear combination of few characteristic functions of elements of S as a "simple" function. We will use the following norm in order to quantify how far from being "simple" a given function is.
Definition 2.2. Let (Ω, F , P) be a probability space, k a positive integer and S a k-semiring on Ω with S ⊆ F . For every f ∈ L 1 (Ω, F , P) we set The quantity f S will be called the S-uniformity norm of f .
The S-uniformity norm is, in general, a seminorm. Note, however, that if the k-semiring S is sufficiently rich, then the function · S is indeed a norm. More precisely, the function · S is a norm if and only if the family {1 S : S ∈ S} separates points in L 1 (Ω, F , P), that is, for every f, g ∈ L 1 (Ω, F , P) with f = g there exists S ∈ S with S f dP = S g dP.
The simplest example of a k-semiring on a nonempty set Ω, is an algebra of subsets of Ω. Indeed, observe that a family of subsets of Ω is a 1-semiring if and only if it is an algebra. Another basic example is the collection of all intervals of a linearly ordered set, a family which is easily seen to be a 2-semiring. More interesting (and useful) k-semirings can be constructed with the following lemma.

Lemma 2.3.
Let Ω be a nonempty set. Also let m, k 1 , . . . , k m be positive integers and set k = m i=1 k i . If S i is a k i -semiring on Ω for every i ∈ [m], then the family is a k-semiring on Ω.
Proof. Clearly we may assume that m 2. Notice, first, that the family S satisfies properties (P1) and (P2) in Definition 2.1. To see that property (P3) is also satisfied, fix S, T ∈ S and write S = Observe that the sets P 1 , . . . , P m are pairwise disjoint. Moreover, . . , m − 1} and C m = Ω, and invoking the definition of the sets P 1 , . . . , P m we obtain that and observe that |I| k. For every (j, n) ∈ I let U j n = B j ∩ R j n ∩ C j and notice that U j n ∈ S, U j n ⊆ R j n and U j n ⊆ P j . It follows that the family {U j n : (j, n) ∈ I} is contained in S and consists of pairwise disjoint sets. Moreover, by (2.5), we have Hence, the family S satisfies property (P3) in Definition 2.1, as desired.
By Lemma 2.3, we have the following corollary.
Corollary 2.4. The following hold.
(a) Let Ω be a nonempty set. Also let k be a positive integer and for every i ∈ [k] let A i be an algebra on Ω. Then the family is a k-semiring on Ω.
(b) Let d, k 1 , . . . , k d be a positive integers and set k = d i=1 k i . Also let Ω 1 , . . . , Ω d be nonempty sets and for every i ∈ [d] let S i be a k i -semiring on Ω i . Then the family is k-semiring on Ω 1 × · · · × Ω d .
Next we isolate some basic properties of the S-uniformity norm.
Lemma 2.5. Let (Ω, F , P) be a probability space, k a positive integer and S a k-semiring on Ω with S ⊆ F . Also let f ∈ L 1 (Ω, F , P). Then the following hold.
Proof. Part (a) is straightforward. For part (b), fix a σ-algebra B on Ω with B ⊆ S and set P = {ω ∈ Ω : E(f | B)(ω) 0} and N = Ω \ P . Notice that P, N ∈ B ⊆ S. Hence, for every S ∈ S we have Finally, assume that S is a σ-algebra and notice that S f dP = S E(f | S) dP for every S ∈ S. In particular, we have f S E(f | S) L1 . Also let, as above, P = {ω ∈ Ω : E(f | S)(ω) 0} and N = Ω \ P . Since P, N ∈ S we obtain that and the proof is completed.
We close this section by presenting some examples of k-semirings which are relevant from a combinatorial perspective. In the first example the underlying space is the Cartesian product of a finite sequence of nonempty finite sets. The corresponding semirings are related to the development of Szemerédi's regularity method for hypergraphs. Example 1. Let d ∈ N with d 2 and V 1 , . . . , V d nonempty finite sets. We view the Cartesian product V 1 × · · · × V d as a discrete probability space equipped with the uniform probability measure. For every nonempty subset F of [d] let π F : i∈[d] V i → i∈F V i be the natural projection and set The family A F is an algebra of subsets of V 1 × · · · × V d and consists of those sets which depend only on the coordinates determined by F . More generally, let F be a family of nonempty subsets of [d]. Set k = |F | and observe that, by Corollary 2.4, we may associate to the family F a k-semiring S F on V 1 × · · · × V d defined by the rule Notice that if the family F satisfies [d] / ∈ F and ∪F = [d], then it gives rise to a non-trivial semiring whose corresponding uniformity norm is a genuine norm.
It turns out that there is a minimal non-trivial semiring S min one can obtain in this way. It corresponds to the family F min = [d] 1 and is particularly easy to grasp since it consists of all rectangles of V 1 × · · · × V d . The S min -uniformity norm is known as the cut norm and was introduced by Frieze and Kannan [13].
At the other extreme, this construction also yields a maximal non-trivial semiring S max on V 1 × · · · × V d . It corresponds to the family F max = [d] d−1 and consists of those subsets of the product which can be written as A 1 ∩ · · · ∩ A d where for every i ∈ [d] the set A i does not depend on the i-th coordinate. The S max -uniformity norm is known as the Gowers box norm and was introduced by Gowers [15,16].
In the second example the underlying space is of the form Ω × Ω where Ω is the sample space of a probability space (Ω, F , P). The corresponding semirings are related to the theory of convergence of graphs (see, e.g., [6,18]). That is, S is the family of all measurable rectangles of Ω × Ω. By Corollary 2.4, we see that S is a 2-semiring on Ω× Ω. The S -uniformity norm is also referred to as the cut norm and is usually denoted by · . In particular, for every integrable random variable f : Ω × Ω → R we have There is another natural semiring in this context which was introduced by Bollobás and Nikiforov [3] and can be considered as the "symmetric" version of S . Specifically, let (2.15) Σ = S × T : S, T ∈ F and either S = T or S ∩ T = ∅ and observe that Σ is a 4-semiring which is contained, of course, in S . On the other hand, note that the family S is not much larger than Σ since every element of S can be written as the disjoint union of at most 4 elements of Σ . Therefore, for every integrable random variable f : Ω × Ω → R we have In the last example the underlying space is the hypercube (2.17) A n = (a 0 , . . . , a n−1 ) : a 0 , . . . , a n−1 ∈ A where n is a positive integer and A is a finite alphabet (i.e., a finite set) with at least two letters. The building blocks of the corresponding semirings were introduced by Shelah [24] in his work on the Hales-Jewett numbers, and are essential tools in all known combinatorial proofs of the density Hales-Jewett theorem (see [10,22,28]).
Example 3. Let n be a positive integer and A a finite alphabet with |A| 2. As in Example 1, we view the hypercube A n as a discrete probability space equipped with the uniform probability measure. Now let a, b ∈ A with a = b. Also let z, y ∈ A n and write z = (z 0 , . . . , z n−1 ) and y = (y 0 , . . . , y n−1 ). We say that z and y are (a, b)-equivalent provided that for every i ∈ {0, . . . , n − 1} and every γ ∈ A \ {a, b} we have In other words, z and y are (a, b)-equivalent if they possibly differ only in the coordinates taking values in {a, b}. Clearly, the notion of (a, b)-equivalence defines an equivalence relation on A n . The sets which are invariant under this equivalence relation are called (a, b)-insensitive. That is, a subset X of A n is (a, b)-insensitive provided that for every z ∈ X and every y ∈ A n if z and y are (a, b)-equivalent, then y ∈ X. We set It follows readily from the above definitions that the family A {a,b} is an algebra of subsets of A n . The algebras A {a,b} : {a, b} ∈ A 2 can then be used to construct various k-semirings on A n . Specifically, let F ⊆ A 2 and set k = |F |. By Corollary 2.4, we see that the family constructed from the algebras {A {a,b} : {a, b} ∈ F } via formula (2.7) is a k-semiring on A n .
The maximal semiring obtained in this way corresponds to the family A 2 . We shall denote it by S(A n ). In particular, we have that S(A n ) is a K-semiring on A n where K = |A|(|A| + 1)2 −1 . Note that K is independent of n. Also observe that if |A| 3, then the S(A n )-uniformity norm is actually a norm.

The main result
First we introduce some terminology and some pieces of notation. We say that a function F : N → R is a growth function provided that: (i) F is increasing, and (ii) F (n) n + 1 for every n ∈ N. Moreover, for every nonempty set Ω and every finite partition P of Ω by A P we shall denote the σ-algebra on Ω generated by P. Clearly, the σ-algebra A P is finite and its atoms are precisely the members of P. Also note if Q and P are two finite partitions of Ω, then Q is a refinement of P if and only if A Q ⊇ A P . Now for every pair k, ℓ of positive integers, every 0 < σ 1, every 1 < p 2 and every growth function F : N → R we define h : N → N recursively by rule Finally, we define Note that if n ∈ N and F : N → N is a primitive recursive growth function which belongs to the class E n of Grzegorczyk's hierarchy (see, e.g., [23]), then the numbers Reg(k, ℓ, σ, p, F ) are controlled, essentially, by a primitive recursive function belonging to the class E m where m = max{4, n + 2}.
We are now ready to state the main result of this paper.
Theorem 3.1. Let k, ℓ be positive integers, 0 < σ 1, 1 < p 2 and F : N → R a growth function. Also let (Ω, F , P) be a probability space and (S i ) an increasing sequence of k-semirings on Ω with S i ⊆ F for every i ∈ N. Finally, let C be a family in L p (Ω, F , P) such that f Lp 1 for every f ∈ C and with |C| = ℓ. Then there exist (a) a natural number N with N Reg(k, ℓ, σ, p, F ), (b) a partition P of Ω with P ⊆ S N and |P| (k + 1) N , and (c) a finite refinement Q of P with Q ⊆ S i for some i N such that for every f ∈ C, we have the estimates The case "p = 2" in Theorem 3.1 is essentially due to Tao [26,27,28]. His approach, however, is somewhat different since he works with σ-algebras instead of k-semirings.
The increasing sequence (S i ) of k-semirings can be thought of as the highercomplexity analogue of the classical concept of a filtration in the theory of martingales. In fact, this is more than an analogy since, by applying Theorem 3.1 to appropriately selected filtrations, one is able to recover the fact that, for any 1 < p 2, every L p bounded martingale is L p convergent. We discuss these issues in Appendix B.
We also note that the idea to obtain "uniformity" estimates with respect to an arbitrary growth function has been considered by several authors. This particular feature is essential when one wishes to iterate this structural decomposition (this is the case, for instance, in the context of hypergraphs -see, e.g., [26]). On the other hand, the need to "regularize", simultaneously, a finite family of random variables appears frequently in extremal combinatorics and related parts of Ramsey theory (see, e.g., [11]). Nevertheless, in most applications (including the applications presented in Section 4), one deals with a single random variable and with a single semiring. Hence, we will isolate this special case in order to facilitate future references.
To this end, for every positive integer k, every 0 < σ 1, every 1 < p 2 and every growth function F : N → R we set where F ′ : N → R is the growth function defined by the rule F ′ (n) = F (k + 1) n for every n ∈ N. We have the following corollary.
we have the estimates .
Finally, we notice that the assumption that 1 < p 2 in the above results is not restrictive, since the case of random variables in L p for p > 2 is reduced to the case p = 2. On the other hand, we remark that Theorem 3.1 does not hold true for p = 1 (see Appendix B). Thus, the range of p in Theorem 3.1 is optimal.
3.1. Proof of Theorem 3.1. We start with the following lemma. Lemma 3.3. Let k be a positive integer, p 1 and 0 < δ 1. Also let (Ω, F , P) be a probability space, Σ a k-semiring on Ω with Σ ⊆ F , Q a finite partition of Ω with Q ⊆ Σ and f ∈ L p (Ω, F , P) with f − E(f | A Q ) Σ > δ. Then there exists a refinement R of Q with R ⊆ Σ and |R| |Q|(k + 1), and such that Proof. By our assumptions, there exists S ∈ Σ such that Since Σ is a k-semiring on Ω, there exists a refinement R of Q such that: (i) R ⊆ Σ, (ii) |R| |Q|(k + 1), and (iii) S ∈ A R . It follows, in particular, that Hence, by (3.9) and the monotonicity of the L p norms, we obtain that and the proof is completed.
We proceed with the following lemma.
Lemma 3.4. Let k, ℓ be positive integers, 0 < δ, σ 1 and 1 < p 2, and set Also let (Ω, F , P) be a probability space and let (Σ i ) be an increasing sequence of k-semirings on Ω with Σ i ⊆ F for every i ∈ N. Finally, let m ∈ N and P a partition of Ω with P ⊆ Σ m and |P| (k + 1) m . Then for every family C in L p (Ω, F , P) with |C| = ℓ there exist j ∈ {m, . . . , m + n} and a refinement Q of P with Q ⊆ Σ j and |Q| (k + 1) j , and such that either The case "p = 2" in Lemma 3.4 can be proved with an "energy increment strategy" which ultimately depends upon the fact that martingale difference sequences are orthogonal in L 2 (see, e.g., [27,Theorem 2.11]). In the non-Hilbertian case (that is, when 1 < p < 2) the geometry is more subtle and we will rely, instead, on Proposition A.1. The argument can therefore be seen as the L p -version of the "energy increment strategy". More applications of this L p -method are given in [12].
Proof of Lemma 3.4. Assume that the first part of the lemma is not satisfied. Note that this is equivalent to saying that (H1) for every j ∈ {m, . . . , m + n}, every refinement Q of P with Q ⊆ Σ j and |Q| (k + 1) j and every We will use hypothesis (H1) to show that part (b) is satisfied. To this end we will argue by contradiction. Let j ∈ {m, . . . , m + n} and let Q be a refinement of P with Q ⊆ Σ j and |Q| (k + 1) j . Observe that hypothesis (H1) and our assumption that part (b) does not hold true, imply that there exists f ∈ C (possibly depending on the partition Q) such that f − E(f | A Q ) Σj+1 > δ. Since the sequence (Σ i ) is increasing, Lemma 3.3 can be applied to the k-semiring Σ j+1 , the partition Q and the random variable f . Hence, we obtain that (H2) for every j ∈ {m, . . . , m + n} and every refinement Q of P with Q ⊆ Σ j and |Q| (k + 1) j there exist f ∈ C and a refinement R of Q with R ⊆ Σ j+1 and |R| (k + 1) j+1 , and such that Recursively and using hypothesis (H2), we select a finite sequence P 0 , . . . , P n of partitions of Ω with P 0 = P and a finite sequence f 1 , . . . , f n in C such that for every i ∈ [n] we have: (P1) P i is a refinement of P i−1 , (P2) P i ⊆ Σ m+i and It follows, in particular, that (A Pi ) n i=0 is an increasing sequence of finite sub-σ-algebras of F . Also note that, by the classical pigeonhole principle and the fact that |C| = ℓ, there exist g ∈ C and I ⊆ [n] with |I| n/ℓ and such that g = f i for every i ∈ I.
Next, set f = g − E(g | A P ) and let (d i ) n i=0 be the difference sequence associated with the finite martingale E(f | A P0 ), . . . , E(f | A Pn ). Observe that for every i ∈ I we have d i = E(g | A Pi ) − E(g | A Pi−1 ) and so, by the choice of I and property (P3), we obtain that d i Lp > δ for every i ∈ I. Therefore, by Proposition A.1, we have On the other hand, by properties (P1) and (P2), we see that P n is a refinement of P with P n ⊆ Σ m+n and |P n | (k + 1) m+n . Therefore, by hypothesis (H1), we must have E(g | A Pn ) − E(g | A P ) Lp σ which contradicts, of course, the estimate in (3.13). The proof of Lemma 3.4 is thus completed.
The following lemma is the last step of the proof of Theorem 3.1.
Also let (Ω, F , P) be a probability space and let (Σ i ) be an increasing sequence of k-semirings on Ω with Σ i ⊆ F for every i ∈ N. Finally, let C be a family in L p (Ω, F , P) such that f Lp 1 for every f ∈ C and with |C| = ℓ. Then there exist j ∈ {0, . . . , L − 1}, J ∈ {n j , . . . , n j+1 } and two partitions P, Q of Ω with the following properties: (i) P ⊆ Σ nj and Q ⊆ Σ J , (ii) |P| (k + 1) nj and |Q| (k +1) J , (iii) Q is a refinement of P, and (iv) Proof. It is similar to the proof of Lemma 3.4. Indeed, assume, towards a contradiction, that the lemma is false. Recursively and using Lemma 3.4, we select a finite sequence J 0 , . . . , J L in N with J 0 = 0, a finite sequence P 0 , . . . , P L of partitions of Ω with P 0 = {∅, Ω} and a finite sequence f 1 , . . . , f L in C such that for every i ∈ [L] we have that: (P1) J i ∈ {n i−1 , . . . , n i }, (P2) the partition P i is a refinement of P i−1 , As in the proof of Lemma 3.4, we observe that (A Pi ) L i=0 is an increasing sequence of finite sub-σ-algebras of F , and we select g ∈ C and I ⊆ [L] with |I| L/ℓ and such that g = f i for every i ∈ I. Let (d i ) L i=0 be the difference sequence associated with the finite martingale E(g | A P0 ), . . . , E(g | A PL ). Notice that, by property (P4), we have d i Lp > σ for every i ∈ I. Hence, by the choice of L, Proposition A.1 and the fact that g Lp 1, we conclude that which is clearly a contradiction. The proof of Lemma 3.5 is completed.
We are ready to complete the proof of Theorem 3.1.
Proof of Theorem 3.1. Fix the data k, ℓ, σ, p, the growth function F , the sequence (S i ) and the family C. We define H : N → R by the rule H(n) = F (n+2) (0) and we observe that H is a growth function. Moreover, for every i ∈ N let m i = F (i) (0) and set Σ i = S mi . Notice that (Σ i ) is an increasing sequence of k-semirings of Ω with Σ i ⊆ F for every i ∈ N. Let j, J, P and Q be as in Lemma 3.5 when applied to k, ℓ, σ, p, H, the sequence (Σ i ) and the family C. We set (3.16) N = m nj = F (nj) (0) and we claim that the natural number N and the partitions P and Q are as desired. Indeed, notice first that n j n L−1 . Since F is a growth function, by the choice of h and R in (3.1) and (3.2) respectively, we have On the other hand, note that n j F (nj ) (0) = N and so |P| (k + 1) nj (k + 1) N and P ⊆ Σ nj = S N . Moreover, by Lemma 3.5, we see that Q is a finite refinement of P with Q ⊆ S i for some i N . It follows that N, P and Q satisfy the requirements of the theorem. Finally, let f ∈ C be arbitrary and write Invoking Lemma 3.5, we obtain that Also observe that n j + 1 J + 1 which is easily seen to imply that S F (N ) ⊆ Σ J+1 . Therefore, using Lemma 3.5 once again, for every i ∈ {0, . . . , F (N )} we have The proof of Theorem 3.1 is completed.

Uniform partitions.
In this section we will discuss some applications of our main result (more applications can be found in [9]). We start with a consequence of Theorem 3.1 which is closer in spirit to the original formulation of Szemerédi's regularity lemma [25].
Recall that if (Ω, F , P) is a probability space, f ∈ L 1 (Ω, F , P) and S ∈ F is an event of non-zero probability, then E(f | S) stands for the conditional expectation of f with respect to S, that is, E(f | S) = S f dP /P(S). If P(S) = 0, then by convention we set E(f | S) = 0. We have the following definition.
Definition 4.1. Let (Ω, F , P) be a probability space, k a positive integer and S a k-semiring on Ω with S ⊆ F . Also let f ∈ L 1 (Ω, F , P), 0 < η 1 and S ∈ S. We say that the set S is (f, S, η)-uniform if for every T ⊆ S with T ∈ S we have Moreover, for every C ⊆ S we set Unf(C, f, η) = {C ∈ C : C is (f, S, η)-uniform}.
Notice that if S ∈ S with P(S) = 0, then the set S is (f, S, η)-uniform for every 0 < η 1. The same remark of course applies if the random variable f is constant on S. Also note that the concept of (f, S, η)-uniformity is closely related to the S-uniformity norm. Indeed, let S ∈ S with P(S) > 0 and observe that the set S is (f, S, η)-uniform if and only if the function f − E(f | S), viewed as a random variable in L 1 (Ω, F , P S ), has S-uniformity norm less than or equal to η. (Here, P S stands for the conditional probability measure of P relative to S.) In particular, the set Ω is (f, S, η)-uniform if and only if f − E(f ) S η.
We have the following proposition (see also [29,Section 11.6]).  The following lemma will enable us to reduce Proposition 4.2 to Corollary 3.2. Lemma 4.3. Let (Ω, F , P) be a probability space, k a positive integer and S a k-semiring on Ω with S ⊆ F . Also let P be a finite partition of Ω with P ⊆ F , f ∈ L 1 (Ω, F , P) and 0 < η 1. Assume that the function f admits a decomposition f = f str + f err + f unf into integrable random variables such that f str is constant on each S ∈ P and the functions f err and f unf obey the estimates f err L1 η 2 /8 and f unf S (η 2 /8)|P| −1 . Then we have Proof. Fix S / ∈ Unf(P, f, η). We select T ⊆ S with T ∈ S such that The function f str is constant on S and so, by (4.4), we see that Next observe that Finally, notice that P(S) > 0 since S / ∈ Unf(P, f, η). Thus, setting , we obtain that P \ Unf(P, f, η) ⊆ A ∪ B.
Since the family P is a partition, it consists of pairwise disjoint sets. Hence, Moreover, By (4.9) and (4.10) and using the inclusion P \ Unf(P, f, η) ⊆ A ∪ B, we conclude that the estimate in (4.3) is satisfied and the proof is completed.
We proceed to the proof of Proposition 4.2.
Proof of Proposition 4.2. Fix k, p and η. We set σ = η 2 /8 and we define F : N → R by the rule F (n) = (n/σ) + 1 = (8n/η 2 ) + 1 for every n ∈ N. Notice that F is a growth function. We set (4.11) U(k, p, η) = Reg ′ (k, p, σ, F ) and we claim that U(k, p, η) is as desired. Indeed, let (Ω, F , P) be a probability space and S a k-semiring on Ω with S ⊆ F . Also let f ∈ L p (Ω, F , P) with f Lp 1. By Corollary 3.2, there exist a positive integer M U(k, p, η), a partition P of Ω with P ⊆ S and |P| = M , and a finite refinement Q of P with Q ⊆ S such that, setting we have the estimates f err Lp σ and f unf S 1/F (M ). It follows that f admits a decomposition f = f str + f err + f unf into integrable random variables such that f str is constant on each S ∈ P, f err Lp σ and f unf S 1/F (M ). Notice that, by the monotonicity of the L p norms, we have f err L1 σ. Hence, by Lemma 4.3 and the choice of σ and F , we conclude that the estimate in (4.2) is satisfied and the proof of Proposition 4.2 is completed.
We close this subsection by presenting an application of Proposition 4.2 for subsets of hypercubes (see also [28]). Specifically, let A be a finite alphabet with |A| 2 and set K = |A|(|A|+1)2 −1 . Also let n be a positive integer. As in Example 3, we view A n as a discrete probability space equipped with the uniform probability measure which we shall denote by P. More generally, for every nonempty subset S of A n by P S we shall denote the uniform probability measure concentrated on S, that is, P S (X) = |X ∩ S|/|S| for every X ⊆ A n . Recall that S(A n ) stands for the K-semiring on A n consisting of all subsets X of A n which are written as where X {a,b} is (a, b)-insensitive for every {a, b} ∈ A 2 . Now let D be a subset of A n , 0 < ε 1 and S ∈ S(A n ) with S = ∅. Notice that the set S is (1 D , S(A n ), ε 2 )-uniform if and only if for every nonempty T ⊆ S with T ∈ S(A n ) we have (4.14) |P T (D) − P S (D)| · P(T ) ε 2 · P(S).
In particular, if S is nonempty and (1 D , S(A n ), ε 2 )-uniform, then for every T ⊆ S with T ∈ S(A n ) and |T | ε|S| we have |P T (D)−P S (D)| ε. Thus, by Proposition 4.2 and taking into account these remarks, we obtain the following corollary.
Corollary 4.4. For every integer k 2 and every 0 < ε 1 there exists a positive integer N (k, ε) with the following property. If n is a positive integer, A is an alphabet with |A| = k and D is a subset of A n , then there exist a positive integer M N (k, ε), a partition P of A n with P ⊆ S(A n ) and |P| = M , and a subfamily P ′ ⊆ P with P(∪P ′ ) 1 − ε such that for every S ∈ P ′ and every T ⊆ S with T ∈ S(A n ) and |T | ε|S|.

L p graphons.
Our last application is an extension of the, so-called, strong regularity lemma for L 2 graphons (see, e.g., [18,19]). To state this extension we need to introduce some terminology and notation related to graphons. Let (Ω, F , P) be a probability space and recall that a graphon 1 is an integrable random variable W : Ω × Ω → R which is symmetric, that is, W (x, y) = W (y, x) for every x, y ∈ Ω. If p > 1 and W is graphon which belongs to L p , then W is said to be an L p graphon (see, e.g., [5]). Now let R be a finite partition of Ω with R ⊆ F and notice that the family (4.16) is a finite partition of Ω × Ω. As in Section 3, let A R 2 be the σ-algebra on Ω × Ω generated by R 2 and observe that A R 2 consists of measurable sets. If W : Ω×Ω → R is a graphon, then the conditional expectation of W with respect to A R 2 is usually denoted by W R . Note that W R is also a graphon and satisfies (see, e.g., [18]) where · is the cut norm defined in (2.14). On the other hand, by standard properties of the conditional expectation (see, e.g., [2]), we have W R Lp W Lp for any p 1. It follows, in particular, that W R is an L p graphon provided, of course, that W ∈ L p .
We have the following corollary.
Corollary 4.5 (Strong regularity lemma for L p graphons). For every 0 < ε 1, every 1 < p 2 and every positive function h : N → R there exists a positive integer s(ε, p, h) with the following property. If (Ω, F , P) is a probability space and W : Ω × Ω → R is an L p graphon with W Lp 1, then there exist a partition R of Ω with R ⊆ F and |R| s(ε, p, h), and an L p graphon U : Ω × Ω → R such that W − U Lp ε and U − U R h |R| .
Proof. Fix the constants ε, p and the function h, and define F : N → R by the rule Notice that F is a growth function. We set and we claim that with this choice the result follows. Indeed, let (Ω, F , P) be a probability space and fix an L p graphon W : Ω×Ω → R with W Lp 1. Also let Σ be the 4-semiring on Ω × Ω which is defined via formula (2.15) for the given probability space (Ω, F , P). We apply Corollary 3.2 to Σ and the random variable W and we obtain (a) a partition P of Ω × Ω with P ⊆ Σ and |P| Reg ′ (4, ε, p, F ), and (b) a finite refinement Q of P with Q ⊆ Σ such that, writing the graphon W as W str + W err + W str where W str = E(W | A P ), , we have the estimates W err Lp ε and W unf Σ 1/F |P| . Note that, by (a) and (b) and the definition of the 4-semiring Σ in (2.15), there exist two finite partitions R, Z of Ω with R, Z ⊆ F and such that P = R 2 and Q = Z 2 . It follows, in particular, that the random variables W str , W err and W unf are all L p graphons.
We will show that the partition R and the L p graphon U := W str + W unf are as desired. To this end notice first that (4.20) |R| |R 2 | = |P| Reg ′ (4, ε, p, F ) = s(ε, p, h).
Next observe that Finally note that, by (4.17), we have (W unf ) R W unf . Moreover, the fact that P = R 2 and the choice of W str yield that (W str ) R = W str . Therefore, F |R| (4.18) h |R| and the proof of Corollary 4.5 is completed.

Remark 1.
Recently, Borgs, Chayes, Cohn and Zhao [5] extended the weak regularity lemma to L p graphons for any p > 1. Their extension follows, of course, from Corollary 4.5, but this reduction is rather ineffective since the bound obtained by Corollary 4.5 is quite poor. However, this estimate can be significantly improved if instead of invoking Corollary 3.2, one argues directly as in the proof of Lemma 3.4. More precisely, note that for every 0 < ε 1, every 1 < p 2, every probability space (Ω, F , P) and every L p graphon W : Ω × Ω → R with W Lp 1 there exists a partition R of Ω with R ⊆ F and and such that W − W R ε. The estimate in (4.23) matches the bound for the weak regularity lemma for the case of L 2 graphons (see, e.g., [18]) and is essentially optimal. A.2. It is easy to see that martingale difference sequences are monotone basic sequences in L p for any p 1. That is, if (d i ) n i=0 is a martingale difference sequence in L p for some p 1, then for every 0 k n and every a 0 , . . . , a n ∈ R we have It follows, in particular, that for every 0 k ℓ n. Another basic property of martingale difference sequences is that they are orthogonal in L 2 . Therefore, for every martingale difference sequence (d i ) n i=0 in L 2 we have We have the following extension of this fact (see also [10,Proposition 3]).
Proposition A.1. Let (Ω, F , P) be a probability space and 1 < p 2. Then for every martingale difference sequence (d i ) n i=0 in L p (Ω, F , P) we have A.3. Proposition A.1 is a consequence of some results from probability theory and Banach space theory which we will briefly discuss in the rest of this appendix. The first basic ingredient of its proof is the celebrated result of Burkholder [7] asserting that martingale difference sequences are unconditional in L p for any p > 1. More precisely, by [8] and [21,Theorem 8.32], it follows that for every probability space (Ω, F , P) and every p > 1, if (d i ) n i=0 is a martingale difference sequence in L p (Ω, F , P), then for every choice (ε i ) n i=0 of signs we have where p * = max{p, p ′ } with p ′ = p/(p − 1) the conjugate exponent of p. The constant p * − 1 in (A.6) is best possible. We also point out that there are several approaches to Burkholder's theorem, some of which are applicable even for Banach space valued martingales (see, e.g., [21,Chapter 8] for a detailed presentation of this material). However, the estimate in (A.6) is at the right level of generality for the circle of problems around Szemerédi's regularity lemma.
A.4. The second ingredient of the proof of Proposition A.1 is the fact that the space L p has cotype 2 for any 1 p 2. Recall that a Banach space X is said to have cotype 2 if there exists a constant C > 0 such that for every finite sequence where (r i ) is the sequence of Rademacher functions. The best constant in (A.7) is denoted by C 2 (X) and is called the cotype 2 constant of X.
The classical Kahane-Khintchine inequality yields that for every 1 p 2 (see, e.g., [1,14,20]). Here, for any q, r > 0, K q,r stands for the optimal constant K > 0 such that for every Banach space X and every finite sequence (x i ) n i=0 in X we have An effective way to estimate these constants is via the, so-called, "two-point inequality" [4] which yields that (A.10) K q,r q − 1 r − 1 Indeed, fix 1 p < 2 and let (x i ) n i=0 be a finite sequence in a Banach space X. Define the random variable S : [0, 1] → R by the rule S(t) = n i=0 r i (t)x i X and let 0 < λ < 1. By Hölder's inequality and (A.9) applied for "q = 2−λp 1−λ " and "r = 2", we have Taking the limit as λ → 0 we obtain that (A.14) which is equivalent to saying that (A.11) is satisfied. The optimal constant for the end-point case "p = 1" in (A.11) was computed by Lata la and Oleszkiewicz [17] who showed that This implies, of course, that the estimate in (A.11) is not optimal, but this has a minor impact upon Proposition A.1. (It effects, in particular, only the factor 4 in the right-hand side of (A.5).) A.5. Proof of Proposition A.1. Let (d i ) n i=0 be an arbitrary martingale difference sequence in L p (Ω, F , P) and observe that n i=0 Since 1 < p 2, by (A.11) and (A.15), we see that (A.17) K 2,p K 2,1 (p * − 1) 4 p − 1 and the proof is completed.
Our goal in this appendix is to use Theorem 3.1 to show that, for any 1 < p 2, every L p bounded martingale is L p convergent. Besides its intrinsic interest, this result also implies that Theorem 3.1 does not hold true for the end-point case p = 1. In fact, based on the argument below, one can easily construct a counterexample to Theorem 3.1 using any L 1 bounded martingale which is not L 1 convergent.
We will need the following known approximation result (see, e.g., [21]). We recall the proof for the convenience of the reader.
Lemma B.1. Let (Ω, F , P) be a probability space and p 1. Also let (g i ) be a martingale in L p (Ω, F , P) and δ > 0. Then there exist an increasing sequence (F i ) of finite sub-σ-algebras of F and a martingale (f i ) adapted to the filtration (F i ) such that g i − f i Lp δ for every i ∈ N.
Proof. Fix a filtration (B i ) such that (g i ) is adapted to (B i ) and let (∆ i ) be the martingale difference sequence associated with (g i ). Recursively and using the fact that the set of simple functions is dense in L p , we select an increasing sequence (F i ) of finite sub-σ-algebras of F and a sequence (s i ) of simple functions such that for every i ∈ N we have that: (i) F i is contained in B i , (ii) ∆ i − s i Lp δ/2 i+2 , and (iii) s i ∈ L p (Ω, F i , P). For every i ∈ N let d i = E(∆ i | F i ) and notice that the sequence (d i ) is a martingale difference sequence since, by (i), Thus, setting f i = d 0 + · · · + d i , we see that (f i ) is a martingale adapted to the filtration (F i ). Moreover, by (ii) and (iii), for every i ∈ N we have and the proof is completed. Now fix 1 < p 2 and a probability space (Ω, F , P), and assume, towards a contradiction, that there exists a bounded martingale (g i ) in L p (Ω, F , P) which is not norm convergent. By (A.3), we see that (g i ) has no convergent subsequence whatsoever. Therefore, by passing to a subsequence of (g i ) and rescaling, we may assume that there exists ε > 0 such that: (i) g i Lp 1/2 for every i ∈ N, and (ii) g i − g j Lp 3ε for every i, j ∈ N with i = j. By Lemma B.1 applied to the martingale (g i ) and the constant δ = min{ε, 1/2}, there exist (P1) an increasing sequence (F i ) of finite sub-σ-algebras of F , and (P2) a martingale (f i ) adapted to the filtration (F i ) such that g i − f i Lp δ for every i ∈ N. Hence, (P3) f i Lp 1 for every i ∈ N, and (P4) f i − f j Lp ε for every i, j ∈ N with i = j.
Notice that, by (P1), for every i ∈ N the space L p (Ω, F i , P) is finite-dimensional. Since · Fi is a norm on L p (Ω, F i , P), there exists a constant C i 1 such that and let (S i ) be defined by S i = F i if i n and S i = F n if i > n. Clearly, (S i ) is an increasing sequence of 1-semirings on Ω. We apply Theorem 3.1 to the probability space (Ω, F n , P), the sequence (S i ) and the random variable f n , and we obtain a natural number N Reg(1, 1, ε/8, p, F ), a finite partition P of Ω with P ⊆ S N and a finite refinement Q of P such that, writing f n = f str + f err + f unf where Finally, notice that E(f unf | F N ) ∈ L p (Ω, F N , P). Thus, by (B.3) and Lemma 2.5, we obtain that With identical arguments we see that Combining (B.7)-(B.12), we conclude that f N −f N +1 Lp ε/2 which contradicts, of course, property (P4). Hence, every bounded martingale in L p (Ω, F , P) is norm convergent, as desired.