A Curved Brunn-Minkowski Inequality for the Symmetric Group

In this paper, we construct an injection $A \times B \rightarrow M \times M$ from the product of any two nonempty subsets of the symmetric group into the square of their midpoint set, where the metric is that corresponding to the conjugacy class of transpositions. If $A$ and $B$ are disjoint, our construction allows to inject two copies of $A \times B$ into $M \times M$. These injections imply a positively curved Brunn-Minkowski inequality for the symmetric group analogous to that obtained by Ollivier and Villani for the hypercube. However, while Ollivier and Villani's inequality is optimal, we believe that the curvature term in our inequality can be improved. We identify a hypothetical concentration inequality in the symmetric group and prove that it yields an optimally curved Brunn-Minkowski inequality.


Introduction
The classical Brunn-Minkowski inequality may be formulated as follows: given two compact nonempty sets A, B ⊂ R n , one has log |M t | ≥ (1 − t) log |A| + t log |B| for any 0 ≤ t ≤ 1, where is the set of t-midpoints of A and B, and | · | is Lebesgue measure. If R n is replaced by a smooth complete Riemannian manifold with positive Ricci curvature bounded below by K > 0, the Brunn-Minkowski inequality can be strengthened to (1.1) log where d is the Hausdorff distance and M t the set of points of the form γ(t) with γ a geodesic in X such that γ(0) ∈ A and γ(1) ∈ B, see [4] and references therein. The heuristic here is that midpoint sets are larger in positively curved space than in flat space, and the degree of this distortion is controlled by Ricci curvature. Y. Ollivier and C. Villani [4] have shown that a curved Brunn-Minkowski inequality analogous to (1.1) holds for the discrete hypercube Z N 2 equipped with the Hamming distance. While the definition of t-midpoints in discrete space is somewhat messy, in the case t = 1 2 , at least, it is reasonable to define the midpoint set M = M 1/2 of A, B ⊆ Z N 2 to be the collection of m ∈ Z N 2 which satisfy d(a, m) + d(m, b) = d(a, b) and d(a, m) = d(m, b) + ε, with ε ∈ {−1, 0, 1}, for some (a, b) ∈ A × B. Adopting these definitions, Ollivier and Villani proved the following curved Brunn-Minkowski inequality for the hypercube [4, Theorem 1]. where K = 1 2N . Ollivier and Villani moreover verify that the dependence of K on N in their result is optimal. As discussed in [4], this result supports the statement that the "discrete Ricci curvature" of Z N 2 is of order N −1 . For any n ≥ 2N , there is an injective group homomorphism Z N 2 −→ S(n) from the N -dimensional hypercube into the symmetric group of rank n determined by where e i ∈ Z N 2 is the bitstring in which all bits are zero except the ith bit, and (2i − 1 → 2i) ∈ S(n) is the transposition which swaps 2i and 2i − 1. If one equips S(n) with the metric induced by the word norm corresponding to the conjugacy class of transpositions, this injection is an isometric embedding of Z N 2 in S(n). It is thus natural to seek an extension of Theorem 1.1 to the symmetric group, viewed as a metric space in this way. In this paper, we prove the following curved Brunn-Minkowski inequality for S(n). Theorem 1.2. For any nonempty sets A, B ⊆ S(n), The Brunn-Minkowski inequality presented in Theorem 1.2 is only slightly curved, and we believe that Theorem 1.2 in fact holds with K = c n−1 , c a positive constant. Although we do not prove this, we identify a hypothetical concentration inequality in the symmetric group which generalizes the hypercube concentration inequality of Ollivier and Villani [4,Corollary 6], and demonstrate that it implies an optimally curved Brunn-Minkowski inequality for the symmetric group.

Symmetric Group Basics
In this section we fix basic notation and terminology concerning the symmetric group S(n). We identify S(n) with its right Cayley graph as generated by the conjugacy class of transpositions. Thus a, b, c, · · · ∈ S(n) are the vertices of our graph, and {a, b} is an edge if and only if a −1 b fixes all but two points of {1, . . . , n}. In this way S(n) becomes a graded graph: it decomposes as the disjoint union where L r is the set of permutations which factor into exactly n − r disjoint cycles; each L r is an independent set; and, finally, there exists an edge between L r and L r if and only if |r − r | = 1. Figure 1 shows the case n = 4.
Each level L r of S(n) further decomposes as the disjoint union where the union is over partitions λ of n with n − r parts, and C λ is the set of permutations with cycle type λ. The sets C λ are the conjugacy classes of S(n).
In this paper we make use of a decomposition of S(n) which is finer than the usual decomposition into conjugacy classes. Given p ∈ S(n), factor p into disjoint cycles, and present each cycle so that its leftmost element is its minimal element. That is, each cycle of p is presented in the form where i 1 < min{i 2 , i 3 , . . . }. Next, list the cycles of p from left to right in increasing order of their minimal elements. Thus p is presented in the form where i 1 < j 1 < k 1 < . . . . We call this the ordered cycle factorization of p. Figure  1 displays the elements of S(4) using their ordered cycle factorizations. We refer to the vector as the sequence of cycle minima of p. The vector ) . . . , in the ordered cycle factorization of p is a composition of n which we call the ordered cycle type of p. We denote by (µ) the number of parts of µ, so that r = n − (µ) if p ∈ L r . Given two permutations p, p of the same ordered cycle type, there is a unique permutation u which both conjugates p into p and transforms the sequence of cycle minima of p into that of p .
Given a composition µ n, we denote by C µ the set of all permutations whose ordered cycle type is µ. Then each conjugacy class C λ in S(n) decomposes as the disjoint union where µ ranges over all compositions obtained by permuting the parts of λ. For the symmetric group S(4), the successive decompositions we have discussed are Each class C µ contains a canonical permutation p µ , which acts by cyclically permuting the first µ 1 positive integers in the canonical way, cyclically permuting the next µ 2 positive integers in the canonical way, and so on. Given p ∈ C µ , we denote by u p ∈ S(n) the unique permutation which both conjugates p µ to p and transforms the sequence of cycle minima of p µ into that of p.
We equip S(n) with the graph theory distance d. Thus level L r in the Cayley graph coincides with the sphere of radius r centred at the identity permutation e ∈ S(n). The following properties of d are easily checked: In particular, the diameter of the Cayley graph is We have already mentioned the fact that the set of permutations which lie on a geodesic path from the identity permutation e to an involution v is isometrically isomorphic to a hypercube whose dimension is half the size of the support of v. We will also make use of the fact that a permutation lies on a geodesic path from e to a forward cycle f if and only if it is a product of forward cycles which together induce a noncrossing partition of the support of f . A proof of this folklore result may be found in [3, Lecture 23].

Midpoint Calculus
In this section, we generalize the encoding/decoding formalism of Ollivier and Villani from the hypercube to the symmetric group. Where possible, we try to be consistent with the notation and terminology of [4].
3.1. Crossovers. Let (a, b) ∈ S(n) × S(n) be a pair of permutations, and let M (a, b) be the corresponding midpoint set. Our first observation is that M (a, b) is the isometric image of a "standard" set of midpoints. More precisely, let µ be the ordered cycle type of a −1 b, and let Cr(µ) denote the set of midpoints of e, the identity permutation, and p µ , the canonical permutation of ordered cycle type µ. Adopting the terminology of [4], we call the elements of Cr(µ) crossovers, or µ-crossovers to be precise. Consider the function is the unique permutation which conjugates p µ to a −1 b and transforms the sequence of cycle minima of p µ into that of a −1 b. This function is an isometry, being composed of rotation by u a −1 b followed by translation by a. Moreover, under this mapping e → a, p µ → b. Thus the mapping restricts to a bijection and view ϕ c (a, b), c ∈ Cr(µ) as a parameterization of the locus M (a, b) by a "standard" set of midpoints. Following the terminology of [4], we call ϕ c (a, b) ∈ M (a, b) the midpoint of a and b encoded by the crossover c ∈ Cr(µ).

3.2.
Duality. There is a natural geometric operation on crossovers: we view a crossover as the lower half of a geodesic path from the identity to the canonical permutation with a given ordered cycle type, and map it to the corresponding upper half. More precisely, given c ∈ Cr(µ), its dual c ∨ is defined by We now establish some technical properties of the operation c → c ∨ which will be needed below. First is the basic but important closure property. Proof. Let c be a midpoint of e and p µ ; we have to check that c ∨ is a midpoint of e and p µ . We have Note that while the map Cr(µ) → Cr(µ) defined by c → c ∨ is bijective, it is not involutive: the dual of the dual of c is c conjugated by p −1 µ . For future use, we extend the duality operation from points to sets: given C ⊆ Cr(µ), we define Next is the following important property of crossover duals. Lemma 3.2. If c ∈ Cr(µ), then c −1 c ∨ has both the same ordered cycle type and the same sequence of cycle minima as p µ .
Proof. First note that c −1 c ∨ = c −2 p µ . Since c lies on a geodesic path linking e to p µ , each cycle of c is a subcycle of some cycle of p µ . Since the cycles of p µ are intervals, our task reduces to proving the following general statement: whenever c 1 c 2 . . . is a product of forward cycles which induce a noncrossing partition of {1, . . . , k}, the product is a cyclic permutation of the numbers 1, . . . , k. If this statement holds, then left multiplication of p µ by c −2 will change neither the cycle structure nor cycle minima of p µ . Note that we can assume each cycle c i is of length at least two. Suppose that the first cycle, c 1 , is where 1 ≤ i 1 < i 2 < · · · < i k1 ≤ k. Let us write the full forward k-cycle in the form where the I * 's are intervals. Since Now, since the cycles c 1 , c 2 , . . . induce a noncrossing partition of {1, . . . , k}, the cycle c 2 is contained in one of the intervals I 1 , . . . , I k1+1 . Thus the same argument applies to compute

3.3.
Encoding. The duality operation has been introduced, and its basic properties developed, in order to make available a structured means of encoding pairs of midpoints using crossovers. To this end, we introduce the mapping Thus (x, y) = Φ c (a, b) is the pair of midpoints of a and b encoded by c and c ∨ . The duality relationship between c and c ∨ induces algebraic and geometric relations between the pairs (a, b) and (x, y) which may be collectively called duality of midpoints. (1) x −1 y has the same ordered cycle type and sequence of cycle minima as a −1 b; (2) a and b are midpoints of x and y; ( Proof. Let us prove these assertions in order. (1) First, we have , Let µ be the ordered cycle type of a −1 b. By definition, u a −1 b conjugates p µ into a −1 b and transforms the sequence of cycle minima of p µ into that of a −1 b. By Lemma 3.2, c −1 c ∨ has both the same ordered cycle type and sequence of cycle minima as p µ . Thus, u a −1 b c −1 c ∨ u −1 a −1 b has both the same ordered cycle type and sequence of cycle minima as a −1 b.
(2) We now show that a and b are midpoints of x and y. Since Similarly, since where ε ∈ {−1, 0, 1} because c is a midpoint of e and p µ . Thus, a is a midpoint of x and y. The proof that b is a midpoint of x and y is just the same.
(3) Finally, we prove the identity which will be needed below in the proof of Proposition 3.4. Since we have that u a −1 b u c −1 c ∨ conjugates p µ into x −1 y. This is one of the two properties that uniquely defines the permutation u x −1 y , the other being that it transforms the sequence of cycle minima of p µ into the sequence of cycle minima of x −1 y. Since u c −1 c ∨ stabilizes the sequence of cycle minima of p µ (by Lemma 3.2), conjugation of p µ by u a −1 b u c −1 c ∨ produces a permutation whose sequence of cycle minima coincides with that of a −1 b, and hence with that of x −1 y by Part (1).

3.4.
Decoding. Our constructions so far may be thought of in cryptographic terms, as follows. Alice and Bob wish to transmit messages to one another across an insecure channel. They meet at a secure location, and agree on a composition µ n and a crossover c ∈ Cr(µ) to be used as a secret encryption key. Alice and Bob then return to their respective locations on opposite ends of the channel. The plaintext messages to be transmitted are pairs (a, b) ∈ S(n) × S(n) such that a −1 b ∈ C µ . To send the message (a, b) to Bob, Alice computes the ciphertext (x, y) = Φ c (a, b) and transmits it to Bob across the channel. Bob receives the ciphertext (x, y), and wishes to recover the plaintext message. Our next result proves that there is a well-defined decryption key δ(c) such that (a, b) = Φ δ(c) (x, y) -in fact, the proof explains how to compute δ(c).  (Φ c (a, b)) = (a, b) holds for all c ∈ Cr(µ) and each (a, b) ∈ S(n)×S(n) verifying a −1 b ∈ C µ . Moreover, this function is an involution.
We call the function δ µ of Proposition 3.4 the decoding function of type µ. In order to lighten the notation, we will henceforth omit the dependence of δ on µ.
Proof. Fix a composition µ n. We claim that the corresponding decoding function is given by First, let us check that the codomain of δ is indeed Cr(µ), i.e. that δ(c) is in fact a midpoint of e and p µ . We have Thus, since c is a midpoint of e and p µ , we have d(e, δ(c)) + d(δ(c), p µ ) = d(e, c) + d(c, p µ ) = d(e, p µ ) and Next, let (a, b) ∈ S(n) × S(n) be a valid plaintext, i.e. a pair of permutations such that a −1 b ∈ C µ , and let (x, y) = Φ c (a, b) be the corresponding ciphertext. We then have By Proposition 3.3, Part (1), we may re-encode (x, y) using δ(c) as an encryption key, arriving at a new ciphertext (x , y ) = Φ δ(c) (x, y). We claim that this reencoding decrypts (x, y), i.e. that (x , y ) = (a, b). Indeed, where we made use of Proposition 3.3, Part (3). Similarly, we have It remains to show that δ : Cr(µ) −→ Cr(µ) is an involution. Let c be a µ-crossover, let (a, b) be a valid message, and consider the triple encoding a, b))). Since δ(c) is the decryption key corresponding to the encryption key c, we have On the other hand, since δ 2 (c) is the decryption key corresponding to the encryption key δ(c), we also have a, b). a, b), which readily implies δ 2 (c) = c.

The Brunn-Minkowski inequality
4.1. Without a curvature term. We now put our encoding-decoding formalism to work. We begin by proving the "flat" Brunn-Minkowski inequality for S(n). To construct such an injection, let µ 1 , . . . , µ m be an enumeration of the ordered cycle types of the permutations For each 1 ≤ i ≤ m, choose an encryption key c i ∈ Cr(µ i ), and let d i = δ(c i ) be the corresponding decryption key.
Partition A × B into classes and consider the map whose restriction to each C i is given by encryption using key c i : We claim that Φ is injective. Indeed, if for some (a, b), (a , b ) ∈ A × B, then we must have (a, b), (a , b ) ∈ C i for some 1 ≤ i ≤ m by Proposition 3.3. Hence  The desired inequality now follows from the fact that the diameter of S(n) is n − 1.
In order to construct Φ as required, let µ 1 , . . . , µ m and C 1 , . . . , C m be as in the proof of Theorem 4.1. Choose a system of encryption keys and consider a second system of encryption keys obtained from the first system by settingc The keysc i are defined in this way so that, by the involutive property of δ, their corresponding decryption keys are the duals of the decryption keys of the first system: We build Φ from the above data as follows. First, partition A × B × {0, 1} into the 2m sets We define Φ by declaring its restriction to C We now prove that Φ so defined is an injection. Suppose are such that (x, y) = Φ(a, b, j) = Φ(a , b , j ). Then, by the first part of Proposition 3.3, we must have i = i . We claim that also j = j . If not, then (relabelling if neccessary) we have and, using (4.1), y)).
This implies a = b, which is impossible since A and B are disjoint.
There are now two cases: j = j = 0 and j = j = 1. In the first case, we have In the second case, we have

4.3.
With an optimal curvature term. We conjecture that the curved Brunn-Minkowski inequality proved in Theorem 4.2 can be improved to the following optimal statement. While we are at present unable to prove Conjecture 4.3, we show here that it is implied by the following conjectural concentration inequality. Remark. In the case where µ = (µ 1 , µ 2 , . . . ) satisfies µ i ∈ {1, 2}, Conjecture 4.4 is true -via the embedding of the hypercube described in the Introduction, it is equivalent to Corollary 6 in [4].
We now explain how Conjecture 4.3 can be deduced from Conjecture 4.4. This argument, which lifts the proof of [4, Theorem 1] from the hypercube to the symmetric group, differs substantially from the proofs of Theorems 4.1 and 4.2. To prove these coarser results, we used static encoding to construct our injections, i.e. a predetermined list of encryption keys. Here, we use an adaptive coding scheme in which all crossovers associated to A × B are employed. where M is the midpoint set of A and B and K = 4ε n − 1 .
Proof. Let µ 1 , . . . , µ m and C 1 , . . . , C m be as in the proof of Theorem 4.1. Put These sets are pairwise disjoint, but note that they are not subsets of A×B. Rather, Taking logs, we obtain the curved Brunn-Minkowski inequality with K = 4ε n − 1 .