Subset Glauber dynamics on graphs, hypergraphs and matroids of bounded tree-width ∗

Motivated by the ‘subgraphs world’ view of the ferromagnetic Ising model, we analyse the mixing times of Glauber dynamics based on subset expansion expressions for classes of graph, hypergraph and matroid polynomials. With a canonical paths argument, we demonstrate that the chains deﬁned within this framework mix rapidly upon graphs, hypergraphs and matroids of bounded tree-width. This extends known results on rapid mixing for the Tutte polynomial, adjacency-rank ( R 2 -)polynomial and interlace polynomial. In particular Glauber dynamics for the R 2 -polynomial was known to mix rapidly on trees, which led to hope of rapid mixing on a wider class of graphs. We show that Glauber dynamics for a very wide class of polynomials mixes rapidly on graphs of bounded tree-width, including many cases in which the Glauber dynamics does not mix rapidly for all graphs. This demonstrates that rapid mixing on trees or bounded tree-width graphs does not oﬀer strong evidence towards rapid mixing on all graphs.


Introduction
We analyse a subset-sampling Markov chain on graphs (and later hypergraphs and matroids) that is derived from subset expansion 1 graph functions, which include many wellknown graph polynomials.We show that this chain mixes rapidly on graphs of constant tree-width.
An edge subset expansion formula for a graph function P is written as follows: for any graph G = (V, E), for some graph function w, where (V, S) denotes the graph with vertex set V and edge set S. If the function w is non-negative, that is, w(G) 0 for all graphs G, we refer to (1) as an edge subset weighting for P and to w as its weight function.In fact, we shall need the weight function to be positive -from a statistical physics viewpoint, this results in a so-called 'soft-core model'.One example of such a function that is prominent in statistical physics, theoretical computer science, and discrete probability is the random cluster model partition function, which can be defined for any G = (V, E) and parameters q, µ as where κ(S) is the number of components in (V, S).For more on the random cluster model, see an extensive treatise by Grimmett [33].Notice that, if q, µ 0, then w((V, S)) := q κ(S) µ |S| provides an edge subset weighting for Z RC (G; q, µ).
A common approach to approximating graph functions expressed by subset expansion formulae is through Markov chain Monte Carlo sampling (MCMC).A Markov chain with state space Ω = {S : S ⊆ E}(= 2 E ) and stationary distribution π(•) ∝ w((V, •)) can be used to sample subsets of the edges.In this approach, an efficient approximation scheme relies on rapid convergence of the Markov chain to its stationary distributionso-called rapid mixing.In our setting, a natural Markov chain is Glauber dynamics [28], the single bond flip chain, in which possible transitions are the addition or removal of one edge from the current subset (state), with transition probabilities designed to produce the desired stationary distribution.In addition to being a principal tool in the design of efficient approximation schemes for counting problems, MCMC has seen widespread use in computational physics, computational biology, machine learning, and statistics.There have been steady advances in our understanding of such random processes and in showing how quickly they generate good approximations of useful probability distributions in huge, complex data sets.See the lecture notes of Jerrum [40] or a survey by Randall [56] for an overview of the application of these techniques in theoretical computer science.
We postpone the precise statement of our main result, Theorem 1, as it requires a host of definitions, but here we give a cursory description.We show that when the weight function w of some subset expansion formula is strictly positive and λ-multiplicative (as defined in Subsection 2.1), then Glauber dynamics is rapidly mixing on graphs of bounded treewidth.Roughly speaking, the condition of λ-multiplicativity is that the weight function is multiplicative with respect to the operation of disjoint graph union as well as "nearly multiplicative" with respect to the operation of composition via small vertex cuts.For now, we remark that many important graph polynomials (e.g.Z RC (G; q, µ)) obey it.The results may be extended to cover hypergraphs and matroids, and polynomials expressed by vertex subset expansion rather than edge subset expansion.Our general approach is an extension of work by Ge and Štefankovič [26] (see also an earlier version [27]), which showed that the Markov chain for the (soft-core) random cluster model -i.e.weighted according to (2) -mixes rapidly upon graphs of bounded tree-width.
Our work started with the R 2 -polynomial; Ge and Štefankovič introduced both this polynomial and its associated Glauber dynamics in an attempt to devise an approximation scheme for #BIS, the problem of counting the number of independent sets in a bipartite graph.Their adjacency-rank polynomial is defined for any G = (V, E) and parameters q, µ as R 2 (G; q, µ) : where rk 2 (S) is the F 2 -rank of the adjacency matrix for (V, S).Using a combinatorial interpretation of rk 2 applicable only to bipartite graphs, they showed that the edge subset Glauber dynamics (using the weighting in (3)) mixes rapidly on trees.They conjectured that the chain mixes rapidly on all bipartite graphs, cf.Conjecture 1 in [27].Our initial objective was to show that the chain mixes rapidly not only on trees, but also on all bipartite graphs of bounded tree-width.We achieved this and more: indeed, we have shown that the R 2 -polynomial fits in our framework without recourse to the combinatorial interpretation for bipartite graphs, and hence that the Markov chain for the R 2 -polynomial mixes rapidly upon all graphs of bounded tree-width.However, in extending the approach of Ge and Štefankovič even further to all λ-multiplicative weight functions, we have highlighted its generality with regard to polynomials and its dependence on bounded tree-width.Since the proof approach works for such a great range of chains that do not mix rapidly in general, it seems that the approach will not extend to larger graph classes without significant alteration.Thus the fact that Glauber dynamics associated with the R 2 -polynomial mixes rapidly on trees cannot be taken as support for the conjecture that it mixes rapidly on all graphs.Indeed, the conjectured rapid mixing of their chain on all bipartite graphs has since been disproved by Goldberg and Jerrum [30].The structure of this paper is as follows.In Subections 1.1 and 1.2, we present further background to the work, to graph polynomials and their computation, as well as extensions.In Section 2, we give the definitions that are necessary for a detailed description of the main theorem.We give the main theorem in Section 3 and then indicate some of its consequences.We present the proofs in Section 4. In Section 5, we extend our the electronic journal of combinatorics 21(4) (2014), #P4.19 results to Glauber dynamics on vertex subsets, that is, on induced subgraphs.For edge subset Glauber dynamics, we outline an extension to hypergraphs in Section 6.We give an indication of how to extend the approach to matroids in Section 7.

Context
The random cluster model has been widely studied both as given in (2), and as the Tutte polynomial of a graph [61].Under a suitable transformation, Z RC (G; q, µ) is equivalent to the Tutte polynomial, defined for any G = (V, E) and parameters x, y as where r(S) is the F 2 -rank of the incidence matrix for (V, S).A wealth of combinatorial and structural information can be obtained from evaluations of this function.Indeed, this polynomial has a remarkable universality property, which informally speaking says that it subsumes any graph invariant that can be computed by deletion and contraction of edges [54], cf.[64].In addition, the Tutte polynomial specialises to several key univariate graph polynomials, including the chromatic polynomial of Birkhoff [6].It specialises to the Jones polynomial in knot theory [42].By its connection with the random cluster model, it also generalises the partition functions of the Ising [38] and Potts [55] models2 .Consult the monograph of Welsh [63] for more on these crucial connections.In addition to Z RC (G; q, µ) and T (G; x, y), we shall highlight a few other specific polynomials from the literature, but for a broad account of the development of graph polynomials, consult the recent surveys by Makowsky [47] and by Ellis-Monaghan and Merino [23,24].It was shown in 1990 by Jaeger, Vertigan and Welsh [39] that, in general, for fixed (rational) values of x and y, the evaluation of T (G; x, y) is #P-hard, except on a few special points and curves in the (x, y)-plane.As a result, there have been substantial efforts since then to pin down the approximation complexity of computing T (G; x, y).For large swaths of the (x, y)-plane, it is now known that the computation of T (G; x, y) either does not admit a fully polynomial-time randomised approximation scheme (FPRAS) unless RP = NP, or is at least as hard as #BIS (the problem of counting independent sets in bipartite graphs) under approximation-preserving reductions, cf.Goldberg and Jerrum [29].The sole positive approximation result applicable to general graphs is the breakthrough FPRAS, using MCMC, by Jerrum and Sinclair [41] for the partition function of the ferromagnetic Ising model -this corresponds to computation of T (G; x, y) along the portion of the parabola (x − 1)(y − 1) = 2 with y > 1. Various approaches have led to efficient approximations in some regions of the Tutte plane for specific classes of graphs -cf.e.g.Alon, Frieze and Welsh [2], Karger [43], and Bordewich [12].
The polynomials and Markov chains that we capture in our framework are defined for all graphs; however, we obtain rapid mixing results only on classes of graphs of bounded (constant) tree-width.For brevity, we do not define tree-width here, but merely say that it is an essential concept in structural graph theory and parameterised complexity -see modern surveys on the topic by Bodlaender [11] and Hliněný et al. [35].The restriction of tree-width is commonly used in graph algorithms to reduce the complexity of a computationally difficult problem, usually by way of dynamic programming.For example, it is already known that many of the polynomials covered here can be exactly evaluated efficiently for graphs of bounded tree-width.Independently, Andrzejak [3] and Noble [50] exhibited polynomial-time algorithms to compute the Tutte polynomial of graphs with bounded tree-width.Works of Makowsky and Mariño [48] and Noble [51] have significantly generalised this, in the former case, to a wide array of polynomials under the framework of monadic second order logic (MSOL), and, in the latter case, to the so-called U -polynomial [52], a polynomial that includes not only the Tutte polynomial but also a powerful type of knot invariant as special cases.
Even though many of the polynomials we refer to can be computed exactly in polynomial time for graphs of bounded tree-width, it remains of interest to show that their associated Glauber dynamics are rapidly mixing.There have been significant and concerted endeavours by researchers spanning physics, computer science and probability to determine the mixing characteristics of Glauber dynamics on many related Markov chains.Spin systems have been of particular interest; indeed, the main thrust of the work of Jerrum and Sinclair [41] was to tackle the partition function for the 'spins world' of the ferromagnetic Ising model (using a translation to the rapidly mixing 'subgraphs world').For more on the connections among the 'spins world', the 'subgraphs world' and the 'random cluster world', see the recent work of Huber [37].Much illuminating recent work on the mixing times of Glauber dynamics has been restricted to trees or tree-like graphs, cf.e.g.[5,19,21,31,49,59].

Extensions
The primary focus in our paper is to establish results for graph polynomials defined according to edge subset expansion; however, we can also adapt our methodology to polynomials defined according to vertex subset expansion, which may be viewed as the 'induced subgraphs world'.To our knowledge, this form of Markov chain has not been greatly examined.One of our motivations to consider vertex subset expansion is to cover the graph polynomial introduced by Arratia, Bollobás and Sorkin [4]: the bivariate interlace polynomial, defined for any graph G = (V, E) and parameters x, y as q(G; x, y) : where rk 2 (S) is the F 2 -rank of the adjacency matrix for G[S], the subgraph of G induced by the vertices in S.This polynomial specialises to the independence polynomial and is intimately related to Martin polynomials [1].Just as for the Tutte polynomial, computation of the bivariate interlace polynomial is #P-hard in almost the entire plane, as was shown by Bläser and Hoffmann [7].The multivariate interlace polynomial, a generalisation of the interlace polynomial, can be evaluated efficiently for graphs of bounded tree-width, cf.Courcelle [18] and Bläser and Hoffmann [8].Subject to a condition on the vertex subset weightings, which we have called vertex λ-multiplicativity, we can establish rapid mixing for vertex subset Glauber dynamics on graphs of constant tree-width.We also observe that our framework can be extended in a natural way to hypergraphs.Though hypergraph polynomials and their corresponding Markov chains have not been studied nearly as widely as their graph counterparts, we point out some important examples.Goldberg and Jerrum [29] used a multivariate hypergraph Tutte polynomial to prove approximate counting hardness of the ferromagnetic (bivariate graph) Tutte polynomial.There is a significant body of research into Glauber dynamics for counting colourings and independent sets in hypergraphs [13,14,16,22].Hypergraph variants of the multivariate chromatic polynomial and the edge elimination polynomial have been introduced by White [65].The study of (counting) complexity for hypergraph parameters continues to grow in importance in relation to constraint satisfaction problems (CSPs).Much of this study involves a restriction of the input hypergraph's width.Unfortunately, there is no unique standard hypergraph analogue of tree-width: consult Hliněný et al. [35] for an overview of several competing notions of width for hypergraphs and the relationship with CSPs.We have made a choice of width parameter that gives the most straightforward extension from our graph framework.
Another possible extension is to matroids, a setting with close connections to the Tutte polynomial, cf.Welsh [62].As an indication of this possibility, we show that (edge) subset Glauber dynamics associated with the Tutte polynomial for matroids mixes rapidly on matroids of bounded branch-width.The difficulty in dealing with matroids is that the definition of λ-multiplicativity we have used for graphs and hypergraphs involves vertex cuts, which are indefinable in a matroid.So although we have not formed a general condition akin to λ-multiplicativity for matroids, we show that the matroid rank function has suitable properties in matroids of bounded branch-width.We note that Hliněný [34] has shown the existence of a polynomial-time algorithm for the computation of the Tutte polynomial for matroids of bounded branch-width representable over a finite field.

λ-multiplicative weight functions
In this subsection, we describe the condition we require on our graph functions P.This condition prescribes that the weight function is multiplicative with respect to the operation of disjoint graph union as well as "nearly multiplicative" with respect to the operation of composition via small vertex cuts.
We use the notation λ := max{λ, 1/λ}.For a graph For fixed λ > 0, we say that the weight function w is λ-multiplicative, if for any G = (V, E), any vertex cut K that separates sets V 1 and V 2 , and any appropriate partition As mentioned above, if w is λ-multiplicative, then it follows that w is multiplicative with respect to disjoint union (by taking K = ∅); furthermore, taking V 2 = ∅ implies that the addition or deletion of a few edges in the graph does not change w wildly.

Examples of valid polynomials
In this subsection, we emphasise specific examples of edge subset weightings and justify that their weight functions are λ-multiplicative.
Let G = (V, E) be any graph, K be any vertex cut that separates vertex subsets V 1 and V 2 , and (E 1 , E 2 ) be any appropriate partition.We define G to be the disjoint union of graphs (V 1 ∪ K, E 1 ) and (V 2 ∪ K, E 2 ).We could imagine forming G from G by splitting each vertex in K, taking incident edges in E 1 with one copy of the vertex and those in E 2 with the other.It is trivial to verify multiplicativity with respect to disjoint union for each of the weight functions considered below.Therefore, to establish λ-multiplicativity for these weight functions, it will suffice to verify that λ−|K| w(G )/w(G) λ|K| .
First, we observe that the partition function of the random cluster model for q, µ > 0 satisfies the condition.Recalling (2), the relevant weight function is w((V, S)) := q κ(S) µ |S| .To handle the µ |S| factor, note that the graphs G and G have the same number of edges.For the q κ(S) factor, the number of components in G can be at most κ(G) + |K| since G can be obtained by splitting |K| vertices of G. Thus, w is λ-multiplicative if we take λ := q.
This can also be seen in the context of the Tutte polynomial when x, y > 1. Recalling (4), the relevant weight function is w((V, S)) := (x − 1) r(E)−r(S) (y − 1) |S|−r(S) .As before, it is easy to take care of the (x − 1) r(E) (y − 1) |S| factor.For the remaining ((x − 1)(y − 1)) −r(S) factor, it is enough to observe that the incidence matrix of G may be obtained from the incidence matrix of G as follows.The matrix for G has two rows for each of the vertices in K, one from (V 1 ∪ K, E 1 ) and one from (V 2 ∪ K, E 2 ).If we replace one of these two rows with the sum of the two rows, we do not alter the rank; if we then delete the other of the two rows, we change the rank by at most 1.Repeating this for each vertex in K, we obtain the incidence matrix for G, at a total change in the rank r of the incidence matrix of at most |K|.Thus, w is λ-multiplicative if we take λ := (x − 1)(y − 1).
Next, we see that the adjacency-rank polynomial of Ge and Štefankovič [27] satisfies the condition if q, µ > 0. Recalling (3), the relevant weight function is w((V, S)) := q rk 2 (S) µ |S| .As before, it is simple to handle the µ |S| factor.For the q rk 2 (S) factor, we note that the adjacency matrix of G may be formed from the adjacency matrix of G by |K| row additions, followed by |K| column additions and finally the deletion of |K| rows and |K| columns.Since we must delete both rows and columns, the rank rk 2 of the adjacency matrix may change by up to 2|K|.Thus, in this case, w is λ-multiplicative when taking λ := q 2 .Now, consider the multivariate Tutte polynomial as formulated by Sokal [58], defined for any graph G = (V, E) and parameters q, v = {v e } e∈E by Under this expansion, w := q κ(S) e∈S v e is an edge subset weight function if q > 0 and v e > 0 for any e ∈ E are fixed.We can handle the q κ(S) factor as we did for the random cluster model partition function.For the e∈S v e factor, observe that G and G have the same set of edges.Thus, w is λ-multiplicative when taking λ := q.
Last, we discuss the U -polynomial of Noble and Welsh [52], defined for any graph G = (V, E) and parameters y, x where κ(i, S) denotes the number of components of order i in (V, S).If y > 1 and x i > 0 for all i, then w((V, S)) S) gives an edge subset weighting.The (y − 1) |S|−r(S) factor can be handled as above.For the |V |  i=1 x i κ(i,S) factor, observe that )| is at most 3|K|, since, if we obtain G by splitting the vertices in K, each time we split a vertex we either change the size of a single component or split a single component into two smaller components.Thus, taking x := max i max{x i , x −1 i } and y := max{y − 1, (y − 1) −1 }, we see that w is λ-multiplicative when taking λ := y x 3 .

Glauber dynamics for edge subsets
In this subsection, we define the Markov chain associated with the edge subset expansion formula for P. From the formulation in (1), the single bond flip chain M on a given graph G = (V, E) is defined as follows.We start with an arbitrary subset X 0 ⊆ E and repeatedly generate X t+1 from X t by running the following experiment.
1. Pick an edge e ∈ E uniformly at random and let S = X t ⊕ {e}.
By convention, we denote the state space of M by Ω (i.e.Ω = 2 E ) and its transition probability matrix by P. With standard arguments, it can be shown that M is a reversible Markov chain that has a unique stationary distribution π satisfying π(S) ∝ w((V, S)).Hence, we may use M as a Markov chain in MCMC sampling for the following problem.
the electronic journal of combinatorics 21(4) (2014), #P4.19 The term rapidly mixing applies to a Markov chain that quickly converges to its stationary distribution.We make this precise here.The total variation distance ν −ν T V between two probability distributions ν and ν is defined by ν −ν T V = 1 2 H∈Ω |ν(H)− ν (H)|.For ε > 0, the mixing time of a Markov chain M (with state space Ω, transition matrix P and stationary distribution π) is defined as In this paper, we shall say that a chain M mixes rapidly if, for any fixed ε, τ (ε) is (upper) bounded by a polynomial in the number of vertices of the input graph.This definition for rapid mixing is the one more commonly used in theoretical computer science, whereas often in statistical physics or discrete probability a stricter O(n log n) bound is mandated.

Results
We are now prepared to state the main theorem.
(where tw(G) denotes the tree-width of G).
In later sections, we describe extensions of this theorem to induced subgraphs (Theorem 6), to hypergraphs (Theorem 11), and to matroids (Theorem 13).
In Subsection 2.2, we noted some examples of polynomials that have λ-multiplicative weight functions.Thus, Theorem 1 implies that each of their associated Glauber dynamics on edge subsets is rapidly mixing upon graphs of bounded tree-width.
Corollary 2. Let G = (V, E) where |V | = n.In the following list, we state conditions on the parameters which guarantee rapid mixing of the single bond flip chain on G associated with the stated polynomial and weighting.We also state the mixing time bound.
1.For fixed q, µ > 0 and the weighting w of Z RC (G; q, µ) given by (2), the mixing time satisfies Equivalently, for fixed x, y > 1 and the weighting w of T (G; x, y) given by (4), the mixing time satisfies 2. For fixed q, µ > 0 and the weighting w of R 2 (G; q, µ) given by (3), the mixing time satisfies 3. For fixed q > 0 and v e > 0 for all e and the weighting w of Z(G; q, v) given by (7), the mixing time satisfies 4. For fixed y > 1 and x i > 0 for all i and the weighting w of U (G; x, µ) given by (8), the mixing time satisfies where x = max i max{x i , x −1 i } and y = max{y − 1, (y − 1) −1 }.
Here, we point out that Ge and Štefankovič obtained part 1 above and showed part 2 above in the special case of trees.Parts 2-4 directly extend these findings, and our main theorem considerably broadens the scope of mixing time bounds for subset Glauber dynamics on graphs of bounded tree-width.

Proofs
Let us first give an outline of our arguments.
Although our main result is stated in terms of tree-width, we do not treat tree-width directly but instead use linear-width, a more restrictive width parameter introduced by Thomas [60].This strategy was also employed by Ge and Štefankovič in the two specific cases mentioned above.For any graph G = (V, E), an ordering (e 1 , . . ., e m ) of E has linear-width at most , if, for each i ∈ {2, . . ., m}, there are at most vertices that are incident to both an edge in {e 1 , . . ., e i−1 } and an edge in {e i , . . ., e m }.The linearwidth lw(G) of G = (V, E) is the smallest integer such that there is an ordering of E with linear-width at most .The motivation for using linear-width is that it implies an ordering of the edges which we can then use to define canonical paths between pairs of edge subsets.Then we show that λ-multiplicativity is the general condition under which we can bound the congestion of these canonical paths.The use of canonical paths is a standard technique for obtaining a bound on the mixing time of MCMC methods -see the lecture notes of Jerrum [40] for an expository account of this approach.
The key property we require that relates the linear-width of G to the more wellstudied parameters path-width pw(G) and tree-width tw(G) of G is the following set of inequalities, details of which can be found in Bodlaender [9], Chung and Seymour [17], Fomin and Thilikos [25], Ge and Štefankovič [26], and Korach and Solel [46].For any graph G on n vertices, pw(G) lw(G) pw(G) + 1 (tw(G) + 1)(log 2 n + 1) + 1. ( We follow a canonical paths strategy to bound the mixing time of M. Given G = (V, E), let σ = (e 1 , . . ., e m ) be an ordering of E. Given I, F ∈ Ω, let I ⊕ F denote the symmetric difference of I and F , let σ[I ⊕ F ] := (e i 1 , . . ., e i k ) denote the restriction of σ to I ⊕ F (that is, {e i 1 , . . ., e i k } = I ⊕ F and i 1 < • • • < i k ), and let γ σ,I→F denote the canonical path from I to F , defined as where H 0 = I, H j = H j−1 ⊕ {e i j } for all j ∈ {1, . . ., k} (and hence To bound the mixing time of M, we will, for some appropriately chosen σ, bound the congestion (Γ σ ) of the canonical paths, which is defined by where |γ σ,I→F | denotes the length of the path γ σ,I→F .The mixing time can then be bounded using the following inequality of Sinclair [57], see also Diaconis and Stroock [20]: for any set Γ of canonical paths, The remainder of the section is devoted to showing the following.
Theorem 3 immediately implies a good mixing time bound for the Markov chain M and hence Theorem 1 follows.
Proof.Substitute the congestion bound of Theorem 3 into inequality (11).
In the proof of Theorem 3, we will need the following lemma.
Lemma 5. Suppose G = (V, E) has linear-width and let σ = (e 1 , . . ., e m ) be an ordering of E with linear-width at most .Suppose I, F ∈ Ω and H is on γ σ,I→F .If w is λmultiplicative for some λ > 0, then where Proof.Since H is on γ σ,I→F , we may assume that H = H j for some j ∈ {0, . . ., k}.Let Q = {e 1 , e 2 , . . ., e i j −1 , e i j } and We can partition V into three sets as follows.Let V 1 denote the set of vertices that are incident only to edges in Q; let V 2 denote the set of vertices that are incident only to edges in Q; let K denote the set of remaining vertices, that is, the set of vertices incident to edges in Q and Q.Note that |K| is at most the linear-width .
No vertex v 1 of V 1 is adjacent to a vertex v 2 of V 2 , as otherwise the edge between them would simultaneously be in Q and Q.This implies that K is a vertex cut separating V 1 and V 2 with respect to G, and also with respect to the graphs (V, I), (V, F ), (V, H), (V, C).

Furthermore, (I
edge partitions that are appropriate for K. Therefore, by the fact that w is λ-muliplicative and |K| , w((V, J)) λ for J ∈ {I, F, H, C}.

Glauber dynamics for vertex subsets
Until now, we have considered edge subsets (subgraphs) and Glauber transitions which change one edge at a time.In this section, we modify our methods to treat vertex subsets (induced subgraphs) and transitions that involve one vertex at a time -each such transition can affect many edges, up to the maximum degree of G.We sketch how to obtain rapid mixing for this process upon graphs of bounded tree-width still with only a modest condition on the base graph polynomials.A vertex subset expansion formula for P is written as follows: for any simple graph G = (V, E), the electronic journal of combinatorics 21(4) (2014), #P4.19 for some graph function w, where G[S] denotes the subgraph of G induced by S. If the function w is non-negative, we refer to (17) as a vertex subset weighting for P and to w as its weight function.Again, for our results to hold, aside from some other constraints, we need the weight function to be positive on all induced subgraphs.From the formulation in ( 17), we define the single site flip chain M on a given graph G = (V, E) as follows.We start with an arbitrary subset X 0 ⊆ V and repeatedly generate X t+1 from X t by running the following experiment.
1. Pick a vertex v ∈ V uniformly at random and let S = X t ⊕ {v}.
We denote the state space of M by Ω (i.e.Ω = 2 V ) and its transition probability matrix by P .It can be shown that M is a reversible Markov chain that has a unique stationary distribution π satisfying π (S) ∝ w(G[S]).Hence, we may use M as a Markov chain in MCMC sampling for the following problem.
Output: a subset S ⊆ V with probability w(G[S])/P(G).
We now describe the condition required of the weight function w in (17).For fixed λ > 0, we say that the weight function w is vertex λ-multiplicative, if for any G = (V, E) and K a vertex cut that separates sets V 1 and V 2 with respect to G, we have Note that, if w is vertex λ-multiplicative, then it follows that w is multiplicative with respect to disjoint union by taking K = ∅; furthermore, taking V 2 = ∅ gives that the addition of a few vertices does not change w wildly.
The main result of this section is the following.
Theorem 6.Let G = (V, E) where |V | = n.If w is vertex λ-multiplicative for some λ > 0, then the mixing time of M on G satisfies + log 1 ε .

A sketch of the proof
As before, we do not treat tree-width directly, but instead work with a different width parameter.For any graph G = (V, E), an ordering (v 1 , . . ., v n ) of V has vertex-separation at most , if, for each i ∈ {2, . . ., n}, there are at most vertices in {v 1 , . . ., v i−1 } that are adjacent to a vertex in {v i , . . ., v n }.The vertex-separation vs(G) of G = (V, E) is the smallest integer such that there is an ordering of V with vertex-separation at most .
It was shown by Kinnersley [45] that the vertex-separation of G satisfies vs(G) = pw(G), and so the inequalities in ( 9) remain relevant.
To bound the mixing time of M , we again follow a canonical paths argument.Given G = (V, E), let σ = (v 1 , . . ., v n ) be an ordering of V .Given I, F ∈ Ω , let I ⊕ F denote the symmetric difference of I and F , let σ , and let γ σ,I→F denote the canonical path from I to F , defined as γ σ,I→F := (H 0 , . . ., H k ), where H 0 = I, H j = H j−1 ⊕ {v i j } for all j ∈ {1, . . ., k} (and hence (11), our bound on the mixing time again follows from a bound on the congestion (Γ σ ), which is defined analogously to (10).
Theorem 7 immediately implies a good mixing time bound for the Markov chain M and hence Theorem 6 also.
Proof.Substitute the congestion bound of (11) into Theorem 7.
Proof of Theorem 6. Substitute the upper bound on vs(G) = pw(G) of ( 9) into Corollary 8.
We omit the proof of Theorem 7 as it is similar to that of Theorem 3, but give the details for the analogue of Lemma 5.
Lemma 9. Suppose G = (V, E) has vertex-separation and let σ = (v 1 , . . ., v n ) be an ordering of V with vertex-separation at most .Suppose I, F ∈ Ω and H is on γ σ,I→F .If w is vertex λ-multiplicative for some λ > 0, then where Proof.Since H is on γ σ,I→F , we may assume that H = H j for some j ∈ {0, . . ., k}.Let Q = {v 1 , . . ., v i j }, and the electronic journal of combinatorics 21(4) (2014), #P4.19 We can partition V into three sets as follows.Let V 1 denote the set of vertices Q; let V 2 denote the subset of Q containing vertices adjacent only to other vertices of Q; and let K denote the set of remaining vertices, that is, the set of vertices of Q incident to vertices of V 1 .Note that |K| is at most the vertex-separation .Clearly, K is a vertex cut separating V 1 and V 2 with respect to G and also with respect to the graphs Therefore, by the fact that w is vertex λ-multiplicative, and noting that By (19), it follows that whereby the lemma easily follows.

An example of a vertex subset chain
Recalling the bivariate interlace polynomial q(G; x, y) in ( 5), which, for fixed x, y > 1, is given by the vertex subset weighting With arguments very similar to those given in Subsection 2.2, it is not difficult to verify that this weight function is vertex λ-multiplicative, as follows.For any graph G = (V, E) and any vertex cut K that separates sets V 1 and V 2 , consider the graph G consisting of the disjoint union of We note that the adjacency matrix of G may be formed from the adjacency matrix of G by altering at most |K| rows and |K| columns, changing the rank rk 2 of the adjacency matrix by up to 2|K|.Thus, w is vertex λ-multiplicative by taking λ := (x − 1) 2 /(y − 1) 2 .So, by Theorem 6, it follows that a natural Markov chain derived from the bivariate interlace polynomial -a chain which has not been studied extensively, as far as we are aware -mixes rapidly on tree-widthbounded graphs.
Corollary 10.Let G = (V, E) where |V | = n.If x, y > 1 are fixed, then for the single site flip chain on G associated with the weighting w of q(G; x, y) given by (5), the mixing time satisfies We believe that it would be of wider interest to study further properties of this single site flip chain on general graphs, in particular to compare it with known results on the random cluster, Potts and Ising models.

Subset Glauber dynamics for hypergraphs
In this section, we outline an extension of our results to hypergraphs.Recall that a hypergraph H is a pair (V, E) where V is the vertex set and E is a hyperedge set that consists of (arbitrary) vertex subsets.We let η(H) denote the maximum size of a hyperedge in H.To each hypergraph H = (V, E), we associate a graph G p (H) called the primal graph or Gaifman graph defined by V (G p (H)) = V and E(G p (H)) = {{u, v} ⊆ V | {u, v} ⊆ e for some e ∈ E}.
We remark at the outset of this section that the generalisation to hypergraphs we give here could be seen as routine.We do not include any details of the proof.Nonetheless, we include the necessary elements for the statement of the hypergraph version of Theorem 1, even if these are straightforward generalisations of the graph versions.We also provide some justification for the hypergraph analogue of (9).It is important to observe that the running time of the hypergraph Markov chain includes an adjustment to take into account the maximum hyperedge size η(H).
For a hypergraph polynomial P, a hyperedge subset expansion formula is written as follows: for any hypergraph H = (V, E), for some graph function w, where (V, S) denotes the hypergraph with vertex set V and hyperedge set S. Notice that if every hyperedge has order two, then this formula is the same as in (1).If the function w is non-negative, that is, w(H) 0 for all hypergraphs H, we refer to (20) as a hyperedge subset weighting for P and to w as its weight function.
For our purposes, we require the weight function to be positive.
For any hypergraph H = (V, E), an ordering (e 1 , . . ., e m ) of E has linear-width at most , if, for each i ∈ {2, . . ., m}, there are at most vertices that are contained in both a hyperedge in {e 1 , . . ., e i−1 } and a hyperedge in {e i , . . ., e m }.The linear-width lw(H) of H = (V, E) is the smallest integer such that there is an ordering of E with linear-width at most .
The tree-width tw(H) and path-width pw(H) of a given a hypergraph H are defined similarly to the tree-and path-width of graphs, cf.Hliněný et al. [35].Since all the vertices in a clique appear in the same bag of any tree decomposition, it follows that tw(H) is equal to the tree-width tw(G p (H)) of the primal graph associated with H and, similarly, pw(H) = pw(G p (H)).Noting that, just as for graphs, lw(H) pw(H), we obtain lw(H) pw(H) (tw(H) + 1)( log 2 n + 1) + 1. ( For a hypergraph ) is a partition of V and there is no hyperedge of E that contains both a vertex of V 1 and a vertex of V 2 .A partition (E 1 , E 2 ) of E is appropriate (for K) if E 1 has no hyperedge containing a vertex in V 2 and E 2 has no hyperedge containing a vertex in V 1 .
For fixed λ > 0, we say that the weight function w is λ-multiplicative, if for any H = (V, E), any vertex cut K that separates sets V 1 and V 2 , and any appropriate partition w(H) λ|K| .
From the formulation in (20), the single bond flip chain M on a given hypergraph H = (V, E) is defined as follows.We start with an arbitrary subset X 0 ⊆ E and repeatedly generate X t+1 from X t by running the following experiment.
1. Pick a hyperedge e ∈ E uniformly at random and let S = X t ⊕ {e}.
We denote the state space of M by Ω (i.e.Ω = 2 E ) and its transition probability matrix by P. It can be shown that M is a reversible Markov chain that has a unique stationary distribution π satisfying π(S) ∝ w((V, S)).
We have obtained the following hypergraph generalisation of Theorem 1.
Theorem 11.Let H = (V, E) be a hypergraph with |V | = n.If w is λ-multiplicative for some λ > 0, then the mixing time of M on H satisfies The proof of this theorem closely follows the pattern set in the case of edge subset Glauber dynamics for graphs.We follow a canonical paths argument that uses an edge ordering by linear-width, to obtain a mixing time bound in terms of linear-width, immediately combined with (21) and the fact that |E| = O(n η(H) ) to obtain Theorem 11.For brevity, we omit the full details; however a rereading of Section 4 with the hypergraph definitions in mind will confirm that the same proof is valid.

Subset Glauber dynamics for matroids
In this section, we outline how to apply our methodology to the Tutte polynomial for matroids.We do not attempt to give the fullest framework for matroids, though it could be distilled from the proof sketches below.A significant difference in treating matroids compared to graphs or hypergraphs is that they are not defined with vertices.Since there are then no direct analogues of a vertex cut, this affects the choice of width parameter and the potential definition for matroidal λ-multiplicativity.Fortunately, there is an existing, well-studied notion of connectivity in matroids, which leads to a notion of matroidal path-width, which we have found useful here.
Recall that a matroid M is a pair (E, I), where E is a ground set of edges and I is a collection of subsets of E (referred to as the independent sets of M ) satisfying the following properties: the electronic journal of combinatorics 21(4) (2014), #P4.19

• ∅ ∈ I;
• for any A ⊆ A ⊆ E, we have A ∈ I =⇒ A ∈ I; and • for any A ⊆ E, all maximal members of I contained in A have the same cardinality.
For any A ⊆ E, the rank r(A) of A is the cardinality of a maximal member of I contained in A, and the connectivity λ(A) of A is equal to r(A) + r(E \ A) − r(E).Both r and λ are non-negative, submodular functions.For further reading on matroid theory, consult a standard reference, by e.g.Oxley [53] or Welsh [62].
The Tutte polynomial naturally extends from graphs to matroids as follows: for any matroid M = (E, I) and parameters x, y, it is defined by For our purposes, we shall assume x, y > 1.Note that if M is a graphic matroid and hence M is the cycle matroid M (G) for some graph G, then the expressions in ( 4) and ( 22) are equal.
For any matroid M = (E, I), an ordering (e 1 , . . ., e m ) of E has path-width at most , if, for each i ∈ {2, . . ., m}, λ({e 1 , . . ., e i−1 }) . The path-width pw(M ) of M = (E, I) is the smallest integer such that there is an ordering of E with path-width at most .For more on this width parameter, consult Kashyap [44].
The path-width of a matroid provides an upper bound on a more closely-studied matroidal width parameter, branch-width.For any matroid M = (E, I), a branchdecomposition of M is a cubic tree T with exactly |E| leaves, labelled in one-to-one correspondence with the elements of E; such a branch-decomposition has width at most , if, for any e ∈ T , letting X e denote a subset of E corresponding to the set of labels contained in one of the two components of T \ e, then λ(X e ) .The branch-width bw(M ) of M = (E, I) is the smallest integer such that there is a branch-decomposition of M with width at most .The branch-width of M can be used to upper bound the path-width of M in the following way.
Proof.Let M = (E, I) be a matroid.We would like to exhibit an ordering of E with path-width at most bw(M ) log 2 |E|.Let T = (V T , E T ) be a branch-decomposition of M with width at most bw(M ).For any vertex v ∈ V T , let T v be the subtree rooted at v and let E v denote the subset of E corresponding to the set of labels contained in T v .(Note that T v has (|T v | + 1)/2 leaves and hence the same quantity of labels.)To obtain an edge ordering, we perform a stack-based depth-first search (DFS) of T , such that smaller subtrees are explored first.Let σ be the ordering of E induced by the order in which their corresponding leaves in T are explored by the DFS 3 .We now show that σ := (e 1 , . . ., e m ) has path-width at most bw(M ) log 2 m.Let i ∈ {1, . . ., m} and suppose that (v 1 , . . ., v k ) are the vertices of T (ordered from bottom to top) on the stack just after the leaf corresponding to e i was explored.Note that the region of T unexplored by the DFS up to this point is precisely k j=1 T v j .The exploration order used by DFS (prioritising smaller subtrees) implies for j ∈ {1, . .
Since there are m labels in T , we see that k log 2 m.It then follows, using submodularity of λ and the fact that the width of the branch-decomposition T is at most bw(M ), that It is worth noting here that matroidal path-width corresponds more closely to graphic linear-width than graphic path-width.Although tree-width can be defined for matroids, it is not a straightforward extension; furthermore, the parameter has not gained as much traction in matroid theory as has branch-width, cf.Hliněný and Whittle [36].On the other hand, Hliněný and Whittle showed that a matroid's tree-width is bounded if and only if its branch-width is bounded.
From the formulation in ( 22), the single bond flip chain M T utte for the Tutte polynomial on a given matroid M = (E, I) is defined as follows.For any A ⊆ E, let w T utte (A) := (x − 1) r(E)−r(A) (y − 1) |A|−r(A) ; in other words, w T utte is the weight function associated with T (M ; x, y).We start with an arbitrary subset X 0 ⊆ E and repeatedly generate X t+1 from X t by running the following experiment.
1. Pick an element e ∈ E uniformly at random and let S = X t ⊕ {e}.
We denote the state space of M T utte by Ω T utte (i.e.Ω T utte = 2 E ) and its transition probability matrix by P T utte .It can be shown that M T utte is a reversible Markov chain that has a unique stationary distribution π T utte satisfying π T utte (S) ∝ w T utte (S).
Combining the above inequalities with an adaptation of the canonical paths proof given in Section 4 -using appropriate substitutions, such as pw(M ) in the place of lw(G) and Proposition 12 in the place of ( 9), together with the well-known fact that the connectivity function of a matroid is closed under the minor relation -we obtain the following matroidal analogue of Theorem 1.The full details of the proof are left to the reader.

Conclusion
In this work, we have studied a general framework of graph polynomials and Markov chains defined via subset expansion formulae for these polynomials.We have demonstrated that the system mixes rapidly for graphs of bounded tree-width.On a graph G with n vertices, we have shown a mixing time of order n O(1) e O(pw(G)) = n O(tw(G)) .Our results apply to many of the most prominent and well-known polynomials in the field.The mixing times of our processes have, respectively, exponential and super-exponential dependencies upon path-width and tree-width.It would be interesting to investigate if this could be improved, in particular, to achieve something akin to fixed-parameter tractability in terms of treewidth.We consider it unlikely that this could be accomplished using only the approaches from this paper.For all of our results, we need that the weight function is strictly positive for all (induced) subgraphs.Many of the classical enumeration polynomials such as the matching, independence, clique and chromatic polynomials are captured by the general polynomials that we mention as examples throughout this work.However, these are 'hard-core models' -in which some (induced) subgraphs have a zero weighting -and hence are not included in our approach.Many of these are evaluations that fall at the boundary of the regions that we can handle.For example, the Tutte polynomial evaluated at the point (2, 1) counts the number of forests of the graph.We have shown rapid mixing at all fixed points (2, 1 + δ), for δ > 0, with a mixing time that depends on δ.It would be interesting to consider whether the chains associated with these boundary points mix rapidly for graphs of bounded tree-width.
After a presentation of these results in Zürich, Thore Husfeldt suggested that our approximation schemes may be of practical use for computing partition functions.The aforementioned dynamic programming algorithms for exact computation all require an explicit tree-width decomposition.Although a decomposition can theoretically be computed in linear time [10], the existing decomposition-generation algorithms have running times hindered by large constants and thus are impractical.On the other hand, an explicit decomposition is not a prerequisite for rapid mixing in our framework, so an efficient randomised approximation may turn out to be more practical than an exact deterministic algorithm.
Lastly, we have demonstrated, with appropriate modifications, several extensions of our results to other settings.We have adapted our framework to include induced subgraphs, hypergraphs and matroids, all using an appropriate choice of width parameter and "λ-multiplicativity".These results are indicative; this aspect of our work has not been fully explored and we hope that further generalisations of Theorem 1 are available.