The expected characteristic and permanental polynomials of the random Gram matrix

A t by n random matrix A is formed by sampling n independent random column vectors, each containing t components. The random Gram matrix of size n, G_n, contains the dot products between all pairs of column vectors in the randomly generated matrix A; that is, G_n = transpose(A) A. The matrix G_n has characteristic roots coinciding with the singular values of A. Furthermore, the sequences det(G_i) and per(G_i) (for i = 0, 1, ..., n) are factors that comprise the expected coefficients of the characteristic and permanental polynomials of G_n. We prove theorems that relate the generating functions and recursions for the traces of matrix powers, expected characteristic coefficients, expected determinants E(det(G_n)), and expected permanents E(per(G_n)) in terms of each other. Using the derived recursions, we exhibit the efficient computation of the expected determinant and expected permanent of a random Gram matrix G_n, formed according to any underlying distribution. These theoretical results may be used both to speed up numerical algorithms and to investigate the numerical properties of the expected characteristic and permanental coefficients of any matrix comprised of independently sampled columns.


Introduction
Let w be a t-tall vector whose components w i are random variables (not necessarily independent) Next, sample n independent vectors w (1) , . . . , w (n) ; creating a t × n matrix A. Then, the random Gram matrix of size n, has a distribution that depends on the underlying distribution of the random vector w. (The symbol T as a superscript is used to denote transpose.) Some general features and convergence properties of the eigenvalues of certain random Gram matrices were derived by Fannes and Spincemaille [9]. Fyodorov formulated correlation functions for permanental polynomials of certain random matrices and noted some similarities and differences between their characteristic and permanental polynomials [10]. Our paper presents combinatorial theory and an efficient algorithm for calculating E(det(G n )) and E(perm(G n )), which are factors comprising the coefficients of the expected characteristic and expected permanental polynomials of G n .
The computation of the determinant is equivalent to matrix multiplication and is therefore contained in the complexity class P (see Chapter 16 of [5]). Currently, the fastest asymptotic algorithm for matrix multiplication is O(n 2.376 ) [8], with a recent unpublished work [23] claiming an improvement to O(n 2.3727 ). Some researchers have suggested that group theoretic observations imply that O(n 2 ) algorithms also exist [20].
At the other complexity extreme, even though the sign is the only difference between the formula for the determinant, and the formula for the permanent, the computation of the permanent is #P-Complete [21,3]. The standard reference for properties of permanents is Minc [16]. The most efficient algorithm currently known for calculating the exact permanent has complexity O(2 n n 2 ), due to Ryser [18]. Jerrum and Sinclair provided a fully-polynomial randomized approximation scheme for approximating permanents of nonnegative matrices [13,14]. Matrix permanents have found applications in physics for calculating Bose-Einstein corrections [24] and in quantum computing for encoding quantum circuit amplitudes [17,4]. Permanental polynomials have been used as invariants for chemical structures [6,7,15]. The rest of the paper is organized as follows: Section 2 is a statement of results, Section 3 contains all proofs, Section 4 reports on some numerical experiments, Section 5 points out a connection to prior work involving the cycle index polynomial of the symmetric group, and Section 6 presents summary and conclusions.

Statement of Results
Before stating our results, we explain all notation. Let w be a t-tall vector whose components w i are random variables (not necessarily independent). Let A be a t × n matrix whose columns are a random sample of n vectors w (1) , . . . , w (n) . Let G n = A T A; we call G n the random Gram matrix of size n, it being understood that the exact distribution of G n depends on the underlying distribution on t-dimensional vectors w = (w 1 , . . . , w t ) T . Although G n is an n × n matrix, its rank is at most t, and generally speaking we take the viewpoint henceforth that n is much larger than t. One may even regard t as fixed, and n → ∞, as we study the effect of taking larger and larger samples. We are especially interested in two expected values, the determinant and the permanent of G n ; these are denoted a n , p n respectively: a n = E(det(G n )) p n = E(perm(G n )).
We define M to be the t × t matrix of underlying second moments, and define the infinite sequence t n as the traces of the powers of M: Finally, we define c i , 0 ≤ i ≤ t, to be the sign-adjusted coefficients of the characteristic polynomial of M, with the familiar indexing: Theorem 1 Let a n , p n , t n denote E(det(G n )), E(perm(G n )), trace(M n ), respectively, as given above. Then, ∞ n=0 a n x n n! = exp and ∞ n=0 p n x n n! = exp The generating function identities in the previous theorem lead immediately to recursions for the sequences a n , p n as given in the corollary: Corollary 2 Let a n , p n , t n denote E(det(G n )), E(perm(G n )), trace(M n ), respectively, as given above. Then, we have the recursions and The next theorem relates the expected values E(det(G n )), E(perm(G n )) to the coefficients c i of the characteristic polynomial for M.
Theorem 3 Let G n , M, c n be respectively the random Gram matrix of size n, the underlying t × t matrix of second moments, and the sign-adjusted coefficients of the characteristic polynomial, Then E(det(G n )) = n!c n . and The last theorem concerns the expected values of the coefficients of the characteristic and permanental polynomials of G n .
Theorem 4 Let a n , p n denote E(det(G n )), E(perm(G n )), respectively, as given above. Let b i , d i be the sign-adjusted coefficients of, respectively, the characteristic and permanental polynomials G n : Then, and Remark. The characteristic polynomials det(λI − AA T ) and det(λI − A T A) have exactly (including multiplicity) the same nonzero roots. With A a t × n matrix, and assuming n ≥ t, then, the latter characteristic polynomial has a factor of λ n−t , and so b i = 0 for i > t. This is consistent with the fact that the a i are nonzero for at most 0 ≤ i ≤ t.

Proofs
Proof of Theorem 1: The Leibniz formula for the determinant is with where S n is the symmetric group on {1, 2, . . . , n}, and sgn(σ) signifies the sign of the permutation σ. Similarly, for the permanent, Since expectation is a linear operator, a n = E(det(G n )) may be obtained by the following strategy Furthermore, p n = E(perm(G n )) can be obtained in the same manner but omitting step 2.
Suppose the permutation σ contains k i cycles of size i, where k i ≥ 0 and n = k 1 + 2k 2 + · · · . The cycle structure of σ alone is enough to determine its sign by the relation: sgn(σ) = k 2 + k 4 + · · · . What can be said about the expected value E(term σ ), given only the cycle structure of σ ? We claim that, like the sign, the latter expected value is determined completely by the cycle structure, as given in the relation Indeed, if (i 1 , i 2 , . . . , i l ) is a cycle of σ having length l, then the quantity product C , defined by is a subproduct of term σ . Moreover, it is seen that the various subproducts associated with the different cycles comprising σ have no rows or columns of the matrix G n in common. These subproducts are consequently independent, and we have The entry (G n ) ij of the Gram matrix is the dot product w (i) · w (j) of columns in the sample matrix A, and so From this we observe that the expectation E(product C ) depends only on the length of the cycle C, and not on the particular columns of A which are involved. That the common value of E(product C ) over all cycles C of length l is equal to t l , the trace of the power M ℓ is seen as follows Thus the claim (11) is justified.
Continuing the proof, we introduce the sign of the permutation to obtain We are now in position to carry out the three-step strategy (*) proposed above. The number of permutations σ which have a given cycle structure If we multiply the right side of (12) by the latter multiplicity and by x n /n! -note the resulting cancellation of n! -and then sum over all sequences (k 1 , k 2 , . . . ) of nonnegative integers which are zero from some point on, we obtain the desired exponential generating function. Hence, ∞ n=0 a n x n n!
as was to be shown. This completes the proof of the first part of Theorem 1. The second part of the theorem, equation (4) giving the exponential generating function of the sequence of permanents p n , is proven in a similar manner.
Proof of Corollary 2: These are proven in the standard manner by comparing the coefficients of x n on both sides of the identities obtained from (3) and (4) by differentiating with respect to x.
In the next proof of Theorem 3 we use the identity det(exp(B)) = exp(trace(B)), valid for any complex square matrix B. See, for example, Section 1.1.10, item 7, page 11 of [ [12]], where the identity is attributed to Jacobi.

Proof of Theorem 3:
We start with the expected determinant, The first equality comes from (3), the third from (13), and the rest are straightforward manipulations. The proof for the expected permanent is similar: This time the first equality follows from (4), the third again is from (13), and, as was the case with the determinant, the rest are straightforward manipulations.

Remark. The function
where T is Hashimoto's edge adjacency operator, is called the Ihara zetafunction of the graph G, see [22], [19].
Proof of Theorem 4: For simplicity, let us assume the probability distribution on vectors w is discrete; say, v 1 , v 2 , . . . with probabilities p(v 1 ), p(v 2 ), . . . . In order to obtain a term λ n−t in the expansion of det(λI − A T A), we choose the n − t λ's from the main diagonal, and then expand the remaining principal submatrix of size t. Since the remaining submatrix is that of −A T A, we obtain the sign-adjustment (−1) t and find Since expectation is a linear operator, the expected value of the i'th coefficient of the characteristic polynomial of A T A is This proves the first assertion of the theorem, and the second assertion regarding the permanental polynomial is demonstrated in a similar manner.

Experimental results
We wrote a Matlab program to compare the expected characteristic and permanental polynomials given by Theorem 4 to those of randomly sampled matrices of various sizes. We computed all permanents using a programmatic link with Maple via the Maple Toolbox for Matlab, and all characteristic polynomials using the Matlab command poly.
There are, of course, infinitely many different distributions which might underly the vectors w. We chose to use one based on sample counts. This is an easily understood distribution, and of interest for possible applications. The idea is to assume a set X of size t, {x 1 , . . . , x t }, with probabilities p 1 , . . . , p t , where t i=1 p i = 1. We take a sample, with replacement, from X of size ℓ, and record w i as the number of times that the element x i is chosen. Then, each of our t-tall vectors w is integral, satisfies t i=1 w i = ℓ, and the distribution on these vectors is the familiar multinomial distribution: The corresponding matrix M of second moments is found to be We have derived the matrix M for several other scenarios which seem natural for applications, but do not report any of these results in the present paper, with one exception. Namely, suppose that the random vectors w are generated as counts, much as above, except the sample size ℓ is also a random quantity. That is, the w come about by a compound process. If we assume the sample size ℓ to be given by a distribution prob(ℓ), then In our experiments, we used the above multinomial distribution to generate random vectors with ℓ = 10, t = 4 and p = [3/8, 1/4, 1/4, 1/8].
We know, theoretically, that a i = 0 for i > t. However, we computed these for confirmation. Thus, we computed t i for i up to 7. Then, from recursion (5), we compute a i . We also generated 1000 matrices G 4000 = A T A at random (by sampling 4000 random vectors from the counting distribution described above to form A) and computed the mean and standard deviation of the appropriate coefficient in det(λI − G 4000 ), divided by a binomial coefficient as given in equation (7). The exact values and the sample values were then compared. Here are the results:   Figure 1. These plots show that, for this distribution, the standard deviation decreases according to a power law.
Using the same t n as above, we used the permanental coefficient recursion (6) to compute the exact values of p n = E(perm(G n )). We also computed the coefficients of permanental polynomials of 1000 random gram matrices G 18 = A T A (created by sampling 18 random vectors from the distribution) and subsequently compared the sampled results with those provided by the recursion. We computed the mean and standard deviation after division by the binomial coefficient as given in (8). The exact and sampled values were then compared. Due to the intractable computational complexity of computing the exact permanent, we were computationally limited to computing only matrices with 18 randomly sampled columns. Here are the results (see Figure 2 for boxplots):  To normalize the results, each coefficient of (−λ) t−i was divided by n i before plotting. Green line represents E(det(G i )) as computed by the recursion (5). To normalize the results, each coefficient of (−λ) t−i was divided by n i before plotting. Green line represents E(perm(G i )) as computed by the recursion (6).

Connection to the cycle index polynomial
For a permutation σ ∈ S n belonging to the symmetric group of order n, let N i (σ) denote the number of cycles in σ of size i. Let X 1 , X 2 , . . . be a countably infinite sequence of variables, and define the polynomial P n (X 1 , . . . , X n ) by Then the quotient P n (X 1 , . . . , X n )/n! is called the cycle index polynomial of the symmetric group. The generating function identity was observed in [2]. The latter paper was devoted to proving that assigning nonnegative real values to the variables X i subject to certain inequalities would result in the real values P n (X 1 , . . . , X n ) satisfying similar inequalities. Coincidentally, the pairs of quantities (−1) n−1 t n , a n and t n , p n studied in this paper satisfy identical generating function identities. In particular, the sequence of expected permanents p n = E(perm(A T A)) are hereby identified as evaluations of the cycle index polynomials at certain weights t i .

Summary and conclusion
We have introduced the notion of a random Gram matrix, and provided theory enabling the efficient computation of the expected determinant and expected permanent of it. The random Gram matrix consists of dot products of vectors taken from various distributions. We further proved generating function identities and recursions relating these expectations to the traces of powers of a second moment matrix. The expected coefficients of the characteristic and permanental polynomials have also been studied, with some numerical experiments checking on the theory. Some of the formulas found are the same as those studied in earlier work in an entirely different context [2].
We have observed empirically that as the number of columns in the sample matrix A increases, the standard deviation of the normalized expected coefficients of the determinantal and permanental polynomials decreases according to a power law. Although the empirical data presented in this paper was limited to the multinomial counting model, the theoretical relationships between the different quantities remain no matter which representation is used. In future work, the theoretical rate of convergence should be formulated according to the representation and probability model used to generate the matrix A (e.g. trivially, when A=0, the truth converges immediately to the expected value).
Can the probabilistic results presented in this work be of any help in managing the complexity of computing the permanent? Already, [1], there is a polynomial time algorithm for computing the permanent of an n × n matrix of rank t, t being fixed. One way for our probabilistic methods to impact complexity considerations would be via finding a distribution on tvectors (t small) such that a given n × n permanent per(H) is equal to or well approximated by the expected value of per(A T A). We have no ideas in this direction.
It is hoped that the theoretical observations we have made will prove useful in processing and comparing large amounts of numerical data, such as those algorithms that use permanental polynomials of large chemical graphs [6,7]. Moreover, the combinatorial relationships between traces of matrix powers, characteristic coefficients, expected permanents, and expected determinants will help us better understand how to use these quantities, create bounds for them, and illuminate what has made them so especially useful in applied numerical science.