$d$-Galvin families

The Galvin problem asks for the minimum size of a family $\mathcal{F} \subseteq \binom{[n]}{n/2}$ with the property that, for any set $A$ of size $\frac n 2$, there is a set $S \in \mathcal{F}$ which is balanced on $A$, meaning that $|S \cap A| = |S \cap \overline{A}|$. We consider a generalization of this question that comes from a possible approach in complexity theory. In the generalization the required property is, for any $A$, to be able to find $d$ sets from a family $\mathcal{F} \subseteq \binom{[n]}{n/d}$ that form a partition of $[n]$ and such that each part is balanced on $A$. We construct such families of size polynomial in the parameters $n$ and $d$.

there exists a set S ∈ F which is balanced on A (i.e., |S ∩ A| = n 4 ). The Galvin problem asks for the minimal size, denoted by m(n), of a Galvin family. An upper bound of m(n) ≤ n 2 follows from the family given by the sets S i = {i, i + 1, . . . , i + n 2 − 1} for i ∈ [n/2]. Lower bounds for the size of Galvin families are more subtle. An easy counting argument shows that m(n) ≥ ( n n/2 ) ( n/2 n/4 ) 2 = Θ( √ n), which is far from n/2. Frankl and Rödl [4] established that m(n) ≥ n for some > 0 whenever n 4 is odd, as a corollary to a strong result in extremal set theory. This linear bound was later strengthened by Enomoto, Frankl, Ito and Nomura [3] to m(n) = n/2, with the same parity constraint, thus showing the optimality of the construction in this special case. Later, using Gröbner basis methods and linear algebra, Hegedűs [5] obtained that m(n) ≥ n 4 whenever n 4 > 3 is a prime. 1.2. Generalizations and related works. Surprisingly, problems closely related to the one of Galvin proved useful in arithmetic complexity theory, in order to give lower bounds on the size of arithmetic circuits computing some target polynomials. This connection was first noticed by Jansen [7], and was recently successfully used in a paper by Alon et al. [2]. There the elements of the Galvin family F are allowed to be sets of size between 2τ and n − 2τ (τ being an integer).  n/2 instead of asking for the existence of a set S ∈ F perfectly balanced on A the authors look for a set S which is nearly balanced, i.e., |S ∩ A| − |S| 2 < τ for the same τ . For this setting, Alon, Kumar and Volk [2] showed, using the so-called polynomial method, that m(n) ≥ Ω(n/τ ).
Alon, Bergmann, Coppersmith, and Odlyzko [1] investigate a problem dealing with {−1, +1} vectors which looks similar to the Galvin one. When rephrasing it as an extremal problem over sets, it reads as follows: what is the minimal number K(n, c) on the size of a family F ⊆ P([n]) such that the following holds where denotes the symmetric difference. Setting c = 0 and asking all sets to be of size n/2 is exactly Galvin problem. However, it does not seem to be any evident dependencies between the two problems.
We consider here a different type of generalization. Asking for a set S ∈ F to be balanced on n/2 is equivalent (up to a factor 2 in the family size) to ask for a partition of [n] in two parts, namely (S, S), such that each part is balanced on A and such that S, S are elements of F. Instead of splitting [n] in two parts, we look for partitions that involve more sets. Introducing a parameter d ∈ N, we want, for a given A, to be able to find d sets in F that form a partition of [n] and such that each set is balanced on A.
The original motivation for considering this generalization stems from arithmetic circuits. There, an open question is to know whether there is a separation between two models of computation called multilinear algebraic branching programs (ml-ABPs) and multilinear circuits (ml-circuits). By "separation", we mean that there is some specific polynomial f that can be computed by a small ml-circuit but any ml-ABP for f must be of size superpolynomial in the degree and the number of variables of f . Proving that any generalized Galvin families (i.e., with d parts in the partitions -see below for a formal definition) must be of superpolynomial size (in n the size of the ground set, and d the number of parts) would imply a separation between ml-ABPs and mlcircuits. Since our main result is to prove that generalized Galvin families of polynomial size exist, this approach is unfortunately not promising. Note that this does not call into question either the plausible separation between ml-ABPs and ml-circuits or the approach through a proof that ml-ABPs cannot compute efficiently so-called "full rank polynomials". This only rules out a specific approach to tackling the question of knowing whether ml-ABPs can efficiently compute full rank polynomials. However, we believe that the construction is of intrinsic combinatorial interest. n/2 , A is handled by F , meaning that there exist d sets S 1 , . . . , S d ∈ F such that: Somewhat surprisingly, small d-Galvin families exist.
Theorem 1. For any d, n ∈ N such that 2d | n, there exists a d-Galvin family of sizeΘ(n 2 d 9 ) .
The next section is devoted to the construction of a d-Galvin family, yielding a proof of the main theorem.

2.2.
Proof of Theorem 1. For technical reasons, we need to distinguish two cases in the proof of Theorem 1: we start by giving a construction when d is reasonably small, then we show how to adapt it to handle larger d.
The overall idea is to construct a family F of sizeΘ(nd 9 ) such that a random set A ∈ [n] n/2 is handled by F with probability at least 1/2. Taking the random family G which is the union of n independent such F increases this probability to at least 1 − 2 −n . By the union bound, the probability that G handles all sets A is non-zero, yielding the existence of the desired family. We now focus on the construction of such a family F.

Construction of F
For a set X, we use the notation A ∼ X to denote that A is a set chosen uniformly at random from X. We let k := n 2d for the rest of the paper.
Before going into the construction, let us see how we can prove the main theorem, with Lemma 1 in hand.
Proof of Theorem 1, first case. Let σ 1 , . . . , σ n be n permutations of [n], chosen uniformly at random. For any of these, construct the family F σ i = σ i (F), i.e., the family from Lemma 1 where any element e ∈ [n] has been replaced by σ i (e). Consider the family G := ∪ i∈[n] F σ i . We aim to prove that G is d-Galvin with non-zero probability. Given a set A, let H i be the event: "A is handled by Thus, by the union bound, there is a non-zero probability that G handles all sets A, concluding the proof of the theorem.
The rest of the section consists of a proof of Lemma 1. The overall strategy is to divide the elements of [n] into buckets, denoted by χ i , and build the sets S from any pair of buckets (χ i , χ j ). Suppose the amount by which these buckets are unbalanced on A are R i and R j respectively. If half the elements of S are chosen from bucket χ i and half from bucket χ j then the amount by which S is unbalanced on A will be close to a normal distribution with expectation depending on R i and R j . By showing a good upper bound on the R i , the probability that S is balanced is reasonably large, and picking only polynomially many random sets S is sufficient. In fact, we must be slightly more careful because the bucket errors accumulate as we pick many sets S. Fortunately, we can manage this by taking an ordering π of the buckets such that the error of ∪ j≤i χ π(j) stays small for all i.
Proof of Lemma 1. First, we divide [n] into several intervals (recall that k = n 2d ).
i . Now, we claim that such a random F handles A ∼ [n] n/2 with probability at least 1/2, giving the existence of the desired family. As there are Θ(d 2 ) pairs (i, j) to consider and for each one we addΘ((n 1/2 d 7/2 ) 2 ) sets S to F, this gives a total size |F| =Θ(nd 9 ).
We fix π to be a permutation that fulfills Claim 1 for the rest of the paper. Claim 2. With probability at least 3 4 Proof. For i ∈ [1, d − 1], each element R i follows a hypergeometric distribution H( n 2 , n, 2k). We get the following bound, due to Hoeffding [6]: With x = ln(13d) √ k this becomes 2 exp(− ln(13d)) = 2 13 · 1 d . R 0 and R d follow the distribution H( n 2 , n, k), which yields an even stronger bound for i = 0 and i = d. Applying a union bound over all i ∈ [d], the probability that at least one |R i | exceeds ln(13d) √ k is bounded by 2 ). Since the {S j } j<i are balanced, we have: using (1) and Therefore, |A∩T π(i−1) | = k 2 −t. To make S i to be balanced we must have |A∩T π(i) |+|A∩T π(i−1) | = k. This means that the probability that S i is balanced is the probability that |A ∩ T π(i) | = k 2 + t. Let x := |A ∩ T π(i) | and R := R π(i) . We have that x follows a hypergeometric distribution with parameters H(k + R, 2k, k). Claim 4 below suffices to establish Claim 3. We state an easy lemma that will be helpful for Claim 4 to estimate binomial coefficients, a proof of which can be found in Spencer and Florescu [8].  Claim 4. We have that x = k 2 + t with probability at least Proof. As x follows a hypergeometric distribution with parameters H(k + R, 2k, k), we have that As long as ( R 2 − t) 3 = o(k 2 ), which is the case when d < n (ln n) 3 , we may apply Lemma 2, we have that (2) equals By Claim 2 we have 0 ≤ t, R ≤ ln(13d) √ k = o(k), therefore we finally get Combining Claim 2 and Claim 3, we have a probability of y then the probability that some choice of T π(i) balances S i is at least 1 − 1 4d . By the union bound, the chance that |R i | is not bounded in Claim 2 or that any S i is unbalanced is at most 1 4 + d 1 4d = 1 2 . Hence the probability that we get a d-Galvin partition is at least 1 2 , as desired.
In the above proof we used d < n (ln n) 3 to apply Lemma 2. While this could perhaps be improved to d = n ln n , there is a real barrier here. When d is this large we expect some buckets to be entirely empty of elements from A and the above proof does not work. We now handle the case where d is larger.
Proof of Theorem 1, second case. First, observe that Galvin families compose nicely; if F is an a-Galvin family over [n], and if we take a b-Galvin family F S over S for each set S ∈ F, then the union of all F S forms an ab-Galvin family.
Set d = n (ln n) 3 and assume for the moment that d and d d are valid factors of d. The idea is to start by constructing a d -Galvin family F over [n], using the previous construction. We then recursively apply the construction to get a d d -Galvin family F S for any each S ∈ F, and the final family is the union of all F S . The elements of F are sets of size (ln n) 3 , therefore the families F S are of sizeΘ(1), and the overall construction is of sizeΘ(n 2 d 9 ).
In the case that d and d d are not valid factors of d, we do the following. Let k = d d . The idea is to construct a family F with sets of size 2k k, and 2(k + 1)k, that behaves like a Galvin family: we ask that any set A has a partition of [n] from sets in F, where each set of the partition is balanced on A. We then apply recursively the construction to split the sets of size 2k k and 2(k + 1)k until we get size k sets. To create the family F, we adapt the construction of the Galvin family when d < n (ln n) 3 , in the following way. Note that in any partition of [n] into sets of these sizes, the number of sets of size 2k k and 2(k + 1)k are fixed (given by d and n). We denote these numbers by f and c. We need to ensure that the T h i ∪ T l j are of the correct sizes (i.e., 2k k or 2(k + 1)k). For that, we change the sizes of the χ i in the following way: We then choose the T h i to be of size k k except for i = 0 where the unique T 0 remains ∅. This gives the desired sizes for |S i | and it is not hard to see that the proof carries over to this case with some simple and obvious modifications.

2.3.
Galvin family without the divisibility condition. The previous definition of a d-Galvin family requires 2d | n. Here we present a relaxed version, which can be defined without the divisibility condition, and prove that such families of polynomial size can be obtained using our previous construction.
When the divisibility condition does not hold we would like d sets to be exactly or almost exactly balanced on A and for those sets to be as close in size as possible. To be exactly balanced they must have evenly many elements, so if [n] is odd then we must include a set of odd size which is imbalanced by 1 element. Of the remaining elements, the closest they can come in size is differing by 2 elements -being of size either 2 k or 2 k . We are able to achieve this best possible outcome.
Definition 3 (d-Galvin family, second version). Given two integers d, n ∈ N with d ≤ n, we say that a family F ⊆ 2 [n] is d-Galvin if for any A ∈ [n] n/2 , A is handled by F , meaning that there exist d sets S 1 , . . . , S d ∈ F such that: Theorem 2. There exists a d-Galvin family of size polynomial in d and n.
Sketch of the proof. We modify the previous construction slightly in order to handle this more general setting. This is very similar to the proof of Theorem 1 in the case d ≥ n (ln n) 3 . Suppose k is not an integer and write k := k . Furthermore, assume for the moment that k = ω((ln n) 3 ) so Figure 6. For n = 29, d = 6, we have three sets of size 2 k , two sets of size 2 k , and one set of size k + k . that the construction from Claim 3 holds. Note that in any partition of [n] into sets that respect properties (1) and (2) of the definition, the number of sets of size 2k , 2k + 1, and 2(k + 1) are fixed (given by d and n). We denote these numbers by f, m and c. We need to ensure that the T h i ∪ T l j are of the correct size in order to be able to fulfill our definition. For that, we change the size of the χ i in the following way: We then choose the T h i to be of size k except for i = 0 where the unique T 0 remains ∅. By doing so, the partitions from the family respect properties (1) and (2), and again the proof that this gives a valid construction is very close to the original proof and we omit the details.
Finally, if k = O((ln n) 3 ) then we may have to simultaneously apply the adjustments above and the ones in the proof of the second case of Theorem 1.

Discussion and open questions
The actual construction is probabilistic and it could be interesting to derandomize it, without increasing too much the size of the family. A way to tackle the problem is to carefully design the sets T i belonging to G i instead of taking them randomly.
The given upper bound is nicely polynomial in n and d but it is unlikely to be tight. We suspect that even modifications of the current construction can yield some improvements. In particular, the family F from Lemma 1 is constructed by taking the union T i ∪ T j over all possible pairs . It might be possible to restrict (i, j) to come from the edges of a sparse graph over the vertices [d], and still prove Claim 1, maybe in some slightly weaker form, possibly saving a factor close to d. Even if this is possible the resulting family is still not likely to be optimal size and hence we have not investigated this approach in detail as it would lead to considerable complications and we prefer a simple construction. A truly optimal construction is likely to require some new ideas.
While there is a linear lower bound for the original Galvin problem, it is not clear how to derive from this linear lower bounds for d-Galvin families for p > 2. An easy counting argument, similar to the one for the original Galvin problem, gives that |F| d−1 ≥ ).
When focusing on large d we get the simple bound below which is an improvement in the regime d = Ω(n 1/5 ): Claim 5. A d-Galvin family must be size at least d 2 2 . Proof. Let us fix a d-Galvin family F over [n], and consider the set B = {(S, x), S ∈ F, x ∈ S}.
We first prove that for any x ∈ [n], there must be at least d 2 sets from F that contain x. Suppose it is not the case for a particular a ∈ [n], and consider a set A of size n 2 that contains (∪ S s.t a∈S S) (such a A exists since by the assumption the union is smaller than or equal to n 2 ). Any set S ∈ F that contains a is completely included in A, and thus cannot be balanced on A. Therefore A is not handled by F.
Finally, observe that the previous remark implies that |B| ≥ nd 2 . As each set S ∈ F is of size n d , the number of sets in F must be at least d 2 2 .