A Sauer-Shelah-Perles Lemma for Sumsets

We show that any family of subsets $A\subseteq 2^{[n]}$ satisfies $\lvert A\rvert \leq O\bigl(n^{\lceil{d}/{2}\rceil}\bigr)$, where $d$ is the VC dimension of $\{S\triangle T \,\vert\, S,T\in A\}$, and $\triangle$ is the symmetric difference operator. We also observe that replacing $\triangle$ by either $\cup$ or $\cap$ fails to satisfy an analogous statement. Our proof is based on the polynomial method; specifically, on an argument due to [Croot, Lev, Pach '17].


Introduction
Let A ⊂ 2 [n] be a family of subsets of an n element set ([n] w.l.o.g). The VC dimension of A, denoted by VC-dim(A), is the size of the largest Y ⊆ [n] such that {S ∩ Y | S ∈ A} = 2 Y . One of the most useful facts about the VC dimension is given by the Sauer-Shelah-Perles Lemma.
Theorem 1.1 (Sauer-Shelah-Perles Lemma [Sauer, 1972, Shelah, 1972 The Sauer-Shelah-Perles Lemma has numerous applications ranging from model theory, probability theory, geometry, combinatorics, and various fields in computer science. A simple-yetuseful corollary of this lemma is that if VC-dim(A) ≤ d, and ⋆ is any binary set-operation (e.g. ⋆ ∈ {∩, ∪, △}) then This corollary is used, for example, by Blumer et al. [1989] to derive closure properties for PAC learnability. Let A ⋆ A denote the family {S ⋆ T | S, T ∈ A}. In this work we explore the converse direction: Does an upper bound on the VC-dimension VC-dim(A ⋆ A) imply an upper bound on |A|? It is not hard to see that VC-dim(A) ≤ VC-dim(A ⋆ A) for ⋆ ∈ {∪, ∩, △}, and therefore, by Our main result quadratically improves this naive bound when ⋆ is symmetric difference: We note that Theorem 1.2 does not hold when ⋆ ∈ {∪, ∩}: pick d ≥ 2, and set Note that A = A ∩ A and therefore d = VC-dim(A) = VC-dim(A ∩ A). However |A| = n ≤d = Θ(n d ), which is not upper bounded by O(n ⌈d/2⌉ ). Picking A = {S ⊆ [n] | |S| ≥ n − d} shows that ∪ behaves similarly like ∩ in this context.
The above examples rules out the analog of Theorem 1.2 for exactly one of ∪, ∩. This suggests the following open question: Another natural question is whether this phenomenon extends to several applications of the symmetric difference operator, for example: Question 2. Does there exist an ǫ < 1/2 such that for every d ≤ n and every A ⊂ 2 [n] : In Section 3 we derive a related statement when △ is replaced by addition modulo p for a prime p, and the VC dimension is replaced by the interpolation degree (which is defined in the next section).

Interpolation degree
Since our proof method is algebraic, it is convenient to view A ⊂ 2 [n] as a subset of the n-dimensional vector space F n 2 over the field of two elements. In this setting A △ A is the sumset of A, denoted A+ A.
Theorem 1.2 will follow from a stronger statement involving a quantity referred to in some places as the regularity (as a special case of Castelnuovo-Mumford regularity from algebraic geometry) [Remscrim, 2016] and in other as the interpolation-degree [Moran and Rashtchian, 2016]. We will use the more descriptive interpolation-degree for the rest of this paper. We begin with some preliminary notations and definitions.
Let A ⊂ F n 2 . It is a basic fact that for each function f : F n 2 → F 2 there exists a unique multilinear polynomial P f ∈ F 2 [x 1 , . . . , x n ] such that f (a) = P f (a) for all a ∈ F n 2 (existence is via simple interpolation and uniqueness follows from dimension counting). For a partial function f : A → F 2 there are many (precisely 2 2 n −|A| ) multilinear polynomials whose restriction to A computes f . Let deg A (f ) denote the minimal degree of any polynomial whose restriction to A computes f . We define the interpolation-degree of A, denoted int-deg(A) to be the maximum of deg A (f ) taken over all functions f : A → F 2 . In other words, int-deg(A) is the smallest d such that any function from A to F 2 can be realized by a polynomial of degree at most d. Clearly, int-deg(A) is an integer between 0 and n. It is also not hard to see that, if A is a proper subset of F n 2 then int-deg(A) < n. Our interest in int-deg(A) comes from the following connection to VC-dimension. Lemma 1.3 (Babai and Frankl [1992], Gurvits [1997], Smolensky [1997], Moran and Rashtchian [2016]).
This Lemma, under various formulations, was proved in several works. The formulation that appears here can be found in [Moran and Rashtchian, 2016]. For completeness, we next sketch the proof: since the set of all multilinear monomials (also those of degree larger than VC-dim(A)) span the set of functions f : A → F 2 , it suffices to show that any monomial (when seen as an A → F 2 function) can be represented a polynomial of degree at most d = VC-dim(A). The crucial observation is that if x S = π i∈S x i is a monomial of degree larger than d, then S is not shattered by A. This means that there is a pattern v : S → {0, 1} that does not appear in any of the vectors in A and therefore where "= A " means equality as functions over A. Now, expanding this product and rearranging the equation yields a representation of x S as sum of monomials x S ′ , where S ′ ⊂ S, which by induction can also be represented by polynomials of degree at most d. Lemma 1.3 reduces Theorem 1.2 to the following stronger statement that is proved in the next section.
2 Proof of Theorem 1.4 The main technical tool will be a lemma of Croot-Lev-Pach [Croot et al., 2017] that was the main ingredient in the recent solution of the cap-set problem [Ellenberg and Gijswijt, 2017] and has found many other applications since then (e.g., [Green, 2016, Solymosi, 2018, Dvir and Edelman, 2017, Fox and Lovász, 2017 to name a few).
Lemma 2.1 (CLP lemma [Croot et al., 2017]). Let P ∈ F q [x 1 , . . . , x n ] be a polynomial of degree at most d over any finite field F q , and let M denote the q n × q n matrix with entries M x,y = P (x + y) for x, y ∈ F n q . Then rank(M ) ≤ 2 · m ⌊d/2⌋ (q, n), where m k (q, n) denotes the number of monomials in n variables x 1 , . . . , x n such that each variable appears with individual degree at most q − 1 and the total degree of the monomial is at most k.
Specializing to our setting of F 2 multilinear polynomials, we see that m k (2, n) = n ≤k and so we conclude: Corollary 2.2. Let P ∈ F 2 [x 1 , . . . , x n ] be a polynomial of degree at most d and let M be as in Lemma 2.1. Then rank(M ) ≤ 2 n ≤⌊d/2⌋ .
We are now ready to prove Theorem 1.4.
Proof of Theorem 1.4. Suppose A ⊂ F n 2 is such that |A| ≥ 2 n ≤⌊d/2⌋ . Let f : A + A → F 2 be such that f (0) = 1, where0 is the all zero vector in F n 2 , and f (a) = 0 for all non-zero a ∈ A + A. It suffices to show that deg A+A (f ) ≥ ⌊d/2⌋ (notice that since A = ∅ it follows that0 ∈ A + A and so f is not constantly 0 on A + A). Let M be the 2 n × 2 n matrix whose rows and columns are indexed by F n 2 and with entries M x,y = f (x + y). By our definition of f we have that the sub-matrix of M whose rows and columns are indexed by A is just the |A| × |A| identity matrix. This implies rank(M ) ≥ |A|.
Let d f = deg A+A (f ) denote the smallest degree of a polynomial whose restriction to A + A computes f . Applying Corollary 2.2 we get that Combining the two inequalities on rank(M ) and using the bound on the size of A we get that which implies ⌊d/2⌋ < ⌊d f /2⌋. This means that d f > d and so int-deg(A + A) > d.

Generalization to sums modulo p
In this section we observe that our proof can be generalized to give stronger bounds in the case when we take p-fold sums of boolean vectors over F p . The case proved in the last section corresponds to (two fold) sums modulo 2. For a subset A ⊂ F n p and a positive integer k, we denote by k · A = {a 1 + . . . + a k | a i ∈ A} the k-fold sumset of A. To formally define the interpolation degree over F p we need to consider, instead of multilinear polynomials, polynomials in which each variable has degree at most p−1. We call such polynomials p-reduced polynomials. The space of all p-reduced polynomials has dimension p n and can uniquely represent any function f : F n p → F p . The degree of such a function is defined to be the total degree of the unique p-reduced polynomial representing it and can range between 0 and (p − 1)n. The interpolation degree of a set A ⊂ F n q is the minimum d such that any function f : A → F p can be represented by a p-reduced polynomial of degree at most d. To avoid confusion we will denote the interpolation degree over F n p as int-deg p (A). We denote by M d (p, n) the set of monomials in n variables x 1 , . . . , x n in which each variables has degree at most p − 1 and the total degree is at most d. When p = 2 we have the closed formula |M d (2, n)| = n ≤d . When p > 2 the quantity |M d (p, n)| is a bit more tricky to compute but is known to satisfy certain asymptotic inequalities (e.g., large deviations [Rassoul-Agha and Seppäläinen, 2015] showing that M δn (p, n) ≤ 2 ǫn with ǫ(δ) going to zero with δ).
The following theorem generalizes Theorem 1.2 when p > 2.
Theorem 3.1. Let p be any prime number and let A ⊂ {0, 1} n ⊂ F n p be such that |A| > p · |M ⌊d/p⌋ (p, n)|. Then int-deg p (p · A) > d.
The proof of the theorem requires the notion of slice-rank of a tensor which was introduced by Tao in his symmetric interpretation of the proof of the cap-set conjecture [Tao, 2016]. By a k-fold tensor of dimension D over a field F we mean a function T mapping ordered tuples (j 1 , . . . , j k ) ∈ [D] n to F. The slice-rank of a k-fold tensor T is a the smallest integer R such that T can be written as a sum T = R i=1 T i such that, for every i ∈ [R] there is some j i ∈ [k] so that T i (j 1 , . . . , j k ) = A(j i )B(j 1 , . . . , j i−1 , j i+1 , . . . , j k ). In other words, we define the 'rank one' tensors to be those in which the dependence on one of the variables is multiplicative (by a function A(j i )) and the rank of a tensor is the smallest number of rank one tensors needed to describe it. For 2-fold tensors (or matrices) this notion coincides with the usual definition of matrix rank.
The proof of Theorem 3.1 will follow from a combination of two lemmas regarding slice rank. The first lemma generalizes the Croot-Lev-Pach lemma (and proved in an a similar way).
Lemma 3.2. Let f : F n p → F p be of degree d. Then the p-fold p n dimensional tensor T : (F n p ) k → F p defined by T (X 1 , . . . , X p ) = f (X 1 + . . . + X p ) has slice rank at most p · M ⌊d/p⌋ (p, n).
Proof. Consider T as a polynomial in p groups of variables X i = (x i 1 , . . . , x i n ) with i = 1, 2, . . . , p. Since the degree of f is d, the degree of T as a polynomial will also be at most d. This means that, in each monomial of T (X 1 , . . . , X p ) = f (X 1 + . . . + X p ), the degree of at least one group of variables will be at most ⌊d/p⌋. Grouping together monomials according to which group has low degree (if there is more than one group take the one with lowest index) we can represent T as a sum of p tensors, each having rank at most M ⌊d/p⌋ (p, n). This completes the proof.
The second lemma needed to prove Theorem 3.1 is due to Tao and shows that the 'diagonal' tensor has full rank.
Proof of Theorem 3.1. To prove the bound on int-deg p (p · A) we describe a function f : p · A → F p that cannot be represented by a low degree polynomial. We take f to be equal to 1 on the zero vector and zero otherwise. We now consider the tensor T (X 1 , . . . , X p ) = f (X 1 + . . . + X p ) defined on A p . Notice that, since A ⊂ {0, 1} n , the sum of p of them is equal to zero iff all p summands are identical. This implies that T is the diagonal tensor δ of Lemma 3.3 and hence has rank equal to |A|. On the other hand, if the degree of f (over p · A) is at most d then, by Lemma 3.2, the tensor T has rank at most p · M ⌊d/p⌋ (p, n). Since we assume that |A| > p · M ⌊d/p⌋ (p, n) this cannot happen and so int-deg p (pA) > d.