Weisfeiler-Leman Indistinguishability of Graphons

The color refinement algorithm is mainly known as a heuristic method for graph isomorphism testing. It has surprising but natural characterizations in terms of, for example, homomorphism counts from trees and solutions to a system of linear equations. Grebík and Rocha (2021) have recently shown that color refinement and some of its characterizations generalize to graphons, a natural notion for the limit of a sequence of graphs. In particular, they show that these characterizations are still equivalent in the graphon case. The k-dimensional Weisfeiler-Leman algorithm (k-WL) is a more powerful variant of color refinement that colors k-tuples instead of single vertices, where the terms 1-WL and color refinement are often used interchangeably since they compute equivalent colorings. We show how to adapt the result of Grebík and Rocha to k-WL or, in other words, how k-WL and its characterizations generalize to graphons. In particular, we obtain characterizations in terms of homomorphism densities from multigraphs of bounded treewidth and linear equations. We give a simple example that parallel edges make a difference in the graphon case, which means that the equivalence between 1-WL and color refinement is lost. We also show how to define a variant of k-WL that corresponds to homomorphism densities from simple graphs of bounded treewidth.


Introduction
The color refinement algorithm is usually used as an efficient heuristic in graph isomorphism testing [12] even though it has more applications, e.g., in machine learning.It iteratively colors the vertices of a (simple) graph, where initially all vertices get the same color.Then, in every refinement round, two vertices v and w of the same color get assigned different colors if there is some color c such that v and w have a different number of neighbors of color c.If these color patterns computed for two graphs G and H do not match, G and H are said to be distinguished by color refinement.
Indistinguishability by color refinement has various characterizations: A result of Dvořák states two graphs G and H are not distinguished by color refinement if and only if the number of homomorphisms hom(T, G) from T to G equals the correspondence number hom(T, H) from T to H for every tree T [7], see also [5].An older result due to Tinhofer [20,19] states that G and H are not distinguished by color refinement if and only if they are fractionally isomorphic, i.e., there is a doubly stochastic matrix X such that AX = XB, where A and B are the adjacency matrices of G and H, respectively.A characterization that is more closely related to the color refinement algorithm itself is given by stable partitions of the vertex set V (G) of a graph G, which are partitions where all vertices in the same class have the same number of neighbors in every other class.The term equitable is also sometimes used for this but may not be confused with equitable partitions from Szemerédi's regularity lemma.The partition induced by the colors of color refinement is the coarsest stable partition, and graphs G and H are fractionally isomorphic if and only if their coarsest stable partitions have the same parameters, i.e., there is a bijection between the partitions that preserves the size of every class C and the numbers of neighbors a vertex in C has in some other class D [20].This, in turn, is equivalent to there being some stable partitions of G and H with the same parameters [18].We collect all these characterizations in Theorem 1.It is worth mentioning that fractional isomorphism can also be seen from the perspective of logic; it corresponds to equivalence in the logic C 2 , the 2-variable fragment of first-order logic with counting quantifiers [13].This, however, does not play a role in this paper, which is why we omit it.
2. Color refinement does not distinguish G and H.

The coarsest stable partitions of V (G) and V (H) have the same parameters.
4. There is a doubly stochastic X such that AX = XB.
5. There are stable partitions of V (G) and V (H) with the same parameters.
The k-dimensional Weisfeiler-Leman algorithm (k-WL) is a variant of color refinement that colors k-tuples of vertices instead of single vertices; here and also throughout the paper, k is an integer with k ≥ 1. See [4] for an overview of the history of k-WL.Usually, no distinction is made between 1-WL and color refinement as they, in some sense, compute equivalent colorings.All of the previously described characterizations of color refinement generalize to k-WL: First of all, k-WL does not distinguish graphs G and H if and only the number of homomorphisms hom(F, G) from F to G is equal to the corresponding number hom(F, H) from F to H for every graph F of treewidth at most k [7,5].The concept of fractional isomorphisms via non-negative solutions to the following system L k iso (G, H) of linear equations, which has a variable X π for every set π ⊆ V (G) × V (H) of size |π| ≤ k.Such a set π is called a partial isomorphism if the mapping it induces is injective and preserves (non-)adjacency.The equivalence of k-WL to precisely this system of linear equations is from [5], although it is already implicit in earlier work [13,1,11].
X π∪{(v,w)} = X π for every π ⊆ V (G) × V (H) of size |π| ≤ k − 1 and every w ∈ V (H) w∈V (H) Stable partitions of the vertex set V (G) of a graph G easily generalize to stable partitions of V (G) k .The coloring computed by k-WL on G induces the coarsest stable partition of V (G) k and two graphs G and H are not distinguished by k-WL if and only if the coarsest stable partitions of V (G) k and V (H) k have the same parameters, which again is equivalent to there being some stable partitions with the same parameters.See, for example, [11], where this is implicitly treated.Also note that equivalence in the logic C 2 generalizes to equivalence in C k+1 , the k + 1-variable fragment of first-order logic with counting quantifiers [4].Let us state the generalization of Theorem 1 to k-WL as Theorem 2.
2. k-WL does not distinguish G and H.
3. The coarsest k-stable partitions of V (G) k and V (H) k have the same parameters.

L k+1
iso (G, H) has a non-negative real solution.5.There are k-stable partitions of V (G) k and V (H) k with the same parameters.
Graphons emerged in the theory of graph limits as limit objects of sequences of dense graphs; see the book of Lovász [16] for a detailed introduction to the theory of graph limits.Formally, a graphon is a symmetric measurable function W : [0, 1] × [0, 1] → [0, 1], although it can be quite useful to consider more general underlying spaces than the unit interval with the Lebesgue measure.Grebík and Rocha recently generalized Theorem 1 to graphons [9].A substantial part of their work involves showing how to even state the characterizations of color refinement that are found in Theorem 1 for graphons.Note that graphs and, more generally, (vertex-and edge-)weighted graphs can be viewed as graphons by partitioning [0, 1] into one interval for each vertex, cf.[16,Section 7.1].This means that Theorem 1 and also a variant for weighted graphs can in fact be restored from their result.In this paper, we show how to marry their result with k-WL to obtain a variant of Theorem 2 for graphons.In the remainder of the introduction, we get more formal with the goal of giving the reader a clear understanding of the results of this paper without going into details too much.A reader interested in these details can then continue with the main part of the paper.In Section 1.1, we first state and explain the result of Grebík and Rocha, before we state and discuss our result and the structure of the main part of this paper in Section 1.2.

Fractional Isomorphism of Graphons
Let us briefly give a formal definition of graphs, homomorphisms, and color refinement.A (simple) graph is a pair G = (V, E), where V is a set of vertices and E ⊆ V 2 a set of edges.We usually write V (G) := V and E(G) := E. A homomorphism from a graph F to a graph G is a mapping h : V (F ) → V (G) such that uv ∈ E(F ) implies h(u)h(v) ∈ E(G).The number of homomorphisms from F to G is denoted by hom(F, G), and t(F, G) := hom(F, G)/|V (G)| |V (F )| is the homomorphism density of F in G. Now, let us turn our attention to color refinement.The initial coloring of the vertices of a graph G is obtained by letting cr G,0 (v) := 1 for every vertex v ∈ V (G).Then, for every n ≥ 0, let for every v ∈ V (G).Here, { {•} } is used as the notion for a multiset.We say that color refinement does not distinguish two graphs G and Instead of the unit interval with the Lebesgue measure, we follow Grebík and Rocha, and throughout the whole paper, let (X, B) denote a standard Borel space and µ a Borel probability measure on X; this has the advantage that we later can consider quotient spaces.We think of (X, B, µ) as atom free, i.e., that there is no singleton set of positive measure, but do not formally require it.A kernel is a (B ⊗ B)-measurable map W : X × X → [0, 1], A symmetric kernel is called a graphon.Grebík and Rocha have shown the following generalization of Theorem 1 to graphons, whose characterizations we elaborate one by one.Theorem 3 ([9]).Let U, W : X × X → [0, 1] be graphons.The following are equivalent: 1. t(T, U ) = t(T, W ) for every tree T .
3. W/C(W ) and U/C(U ) are isomorphic.4.There is a Markov operator S : L 2 (X, µ) → L 2 (X, µ) such that T U • S = S • T W . 5.There are U -and W -invariant µ-relatively complete sub-σ-algebras C and D, respectively, such that U C and W D are weakly isomorphic.
For Characterization 2, the homomorphism density of a graph F in a graphon W : Note that this coincides with the previous definition for graphs, i.e., when viewing a graph G as a graphon W G we have t(F, G) = t(F, W G ) [16, (7.2)].Characterization 2 generalizes color refinement to graphons and requires more formal precision than in the case of graphs.Grebík and Rocha first define the standard Borel space M of iterated degree measures, which can be seen as the space of colors used by color refinement; Its elements are sequences α = (α 0 , α 1 , α 2 , . . . ) of colors after 0, 1, 2, . . .refinement rounds.Then, for a graphon W : X × X → [0, 1], they define the measurable function cr W : X → M mapping every x ∈ X to such a sequence (α 0 , α 1 , α 2 , . . .).Then, the distribution on iterated degree measures (DIDM) ν W defined by ν W (A) := µ(cr −1 W (A)) for every A ∈ B(M), i.e., as the push-forward of µ via cr W , is a probability measure on the space M. Note the similarity between Characterization 2 and color refinement not distinguishing two graphs: The multisets used in the definition of color refinement indistinguishability can be seen as maps mapping a color to a natural number stating how often it occurs in the graph.Intuitively, a DIDM does the same for a set of colors and a number in [0, 1].
Characterization 3 generalizes the coarsest stable partitions of the vertex set V (G) of a graph G to the minimum W -invariant µ-relatively complete sub-σ-algebra C W for a graphon W : X × X → [0, 1].Let us break down this term bit by bit, starting with µ-relatively complete sub-σ-algebras of B. Let L 2 (X, µ) := L 2 (X, B, µ) denote the Hilbert space of all measurable realvalued functions on X with f 2 < ∞ modulo equality µ-almost everywhere.For a sub-σ-algebra C of B, we want to consider the subspace of all C-measurable functions of L 2 (X, C, µ).To make this statement formally precise, a sub-σ-algebra The set of all µ-relatively complete sub-σ-algebras of B is denoted by Θ(B, µ).As an example, the smallest µ-relatively complete sub-σ-algebra that includes {∅, X} corresponds to the trivial partition of the vertex set of a graph.A kernel for every f ∈ L 2 (X, µ) and every x ∈ X.It is a well-defined Hilbert-Schmidt operator [16,Section 7.5], and if W is a graphon, then T W is self-adjoint.In general, for an operator Grebík and Rocha show that, for a graphon W : X × X → [0, 1], the minimum W -invariant µ-relatively complete sub-σ-algebra C W of B can be obtained by iterative applications of T W when starting from {∅, X}.From this, they define a quotient graphon W/C W . Formally, for every C ∈ Θ(B, µ), there is a corresponding quotient space, i.e., a standard Borel space (X/C, C ′ ) with a Borel probability measure µ/C on X/C, and W/C W is defined on the space X/C × X/C.Then, saying that two such quotient graphons are isomorphic corresponds to saying that two coarsest stable partitions have the same parameters.As a side note, in their proof, Grebík and Rocha show that every DIDM ν defines a kernel M × M → [0, 1].They show that, for a graphon W : X × X → [0, 1] and its DIDM ν W , this kernel on M × M is actually isomorphic to W/C W . Intuitively, we can view this as a canonical representation of W on the space of all colors.
Characterization 5 is similar to Characterization 3. Just as the coarsest stable partitions of the vertex sets of two graphs have the same parameters if and only if there are some stable partitions with the same parameters, the minimum U -and W -invariant µ-relatively complete sub-σ-algebras can be replaced by some U -and W -invariant µ-relatively complete sub-σ-algebras C. Note that there is a subtle difference in the way Grebík and Rocha phrase Characterization 5 as they use the conditional expectation instead of the quotient spaces: W C is defined as the conditional expectation of W given C × C. Intuitively, W C is obtained by averaging over the color classes of C, while W/C is obtained by averaging over the color classes of C and then identifying all elements of a color class.Then, the resulting graphons are required to be weakly isomorphic, where two graphons U, W : X × X → [0, 1] are called weakly isomorphic if t(F, U ) = t(F, W ) for every simple graph F .This is the usual notion of isomorphism used for graphons, and two graphons are weakly isomorphic if and only if they have cut distance zero, cf.[16,Section 10.7].
Finally, Characterization 4 generalizes fractional isomorphisms.For standard Borel spaces (X, B) and (Y, D) with Borel probability measures µ and ν on X and Y , respectively, an operator S : Here, 1 X and 1 Y denote the all-one functions on X and Y , respectively, and S * denotes the Hilbert adjoint of S, which is the unique operator S * : L 2 (Y, ν) → L 2 (X, µ) satisfying Sf, g = f, S * g for all f ∈ L 2 (X, µ), g ∈ L 2 (Y, ν).Markov operators are simply the infinite-dimensional analogue to doubly stochastic matrices.With this in mind, the connection of Characterization 4 to the graph case is obvious.

Weisfeiler-Leman Indistinguishability of Graphons
Let us first state the definition of k-WL, which is important as there actually are two nonequivalent definitions to be found in the literature.Following Grohe [10], we refer to these distinct definitions as k-WL and oblivious k-WL.Both k-WL and oblivious k-WL operate on k-tuples of vertices, but in terms of expressive power, k-WL is equivalent to oblivious k + 1-WL in the sense that they distinguish the same graphs.Hence, from an efficiency point of view, k-WL is more interesting as it needs less memory to achieve the same expressive power, but in our case, oblivious k-WL is more interesting as the connections to other characterizations are much cleaner, cf. the mismatch between the k in k-WL and the k + 1 in the system L k+1 iso (G, H) of linear equations in Theorem 2 or the k + 1 in the logic C k+1 .The reason that the k in k-WL matches the k in "treewidth k" is just that one is subtracted from the bag width in the definition of treewidth.
Let us start with k-WL.Let G be a graph.The atomic type atp and, for every for every v ∈ V (G) k .Here, v[w/j] denotes the k-tuple obtained from v by replacing the jth component by w; the k-tuple v[w/j] is usually called a j-neighbor of v.We say that k-WL does not distinguish graphs G and The colorings computed by 1-WL and color refinement induce the same partition and, in particular, 1-WL distinguishes two graphs if and only if color refinement does [10,Proposition V.4].For oblivious k-WL, we also let owl k G,0 (v) := atp G (v), but then for every n ≥ 0, we define for every v ∈ V (G) k .We say that oblivious k-WL does not distinguish graphs G and As mentioned before, k-WL is equivalent to oblivious k + 1-WL in the sense that two graphs are distinguished by k-WL if and only if they are distinguished by oblivious k + 1-WL [10,Corollary V.7].This equivalence becomes clearer when diving into the details of this paper: intuitively, given a tree decomposition of width k, we may dissect it into parts at bags of size k or at bags of size k + 1.
Let us state our main theorem, Theorem 4, before explaining its characterizations one by one.As mentioned before, it is based on oblivious k-WL, so there is a mismatch by one when comparing it to Theorem 2. Theorem 4. Let k ≥ 1 and U, W : X × X → [0, 1] be graphons.The following are equivalent: 3. There is a (permutation-inv.)Markov iso.R : 4. There is a (permutation-inv.)Markov operator S : 5. There are µ ⊗k -relatively complete sub-σ-algebras C, D of B ⊗k that are U -invariant and Winvariant, respectively, and a Markov iso.R : First, let us examine Characterization 1, which uses multigraph homomorphism densities.A multigraph G = (V, E) is defined like a graph with the exception that E is a multiset of edges from V 2 .For a graphon W : X × X → [0, 1], the definition (1) of the homomorphism density t(F, W ) of F in W also makes sense for a multigraph F .We define the treewidth of a multigraph analogously to the case of simple graphs, i.e., we do not take the edge multiplicities into account.Note that, since the class of multigraphs of treewidth k is closed under taking disjoint unions, we could always assume the graphs in Characterization 1 to be connected.For example, in the case k = 2, it can also be phrased in terms of trees with parallel edges.
Two graphons U, W are weakly isomorphic, i.e., t(F, U ) = t(F, W ) for every graph F , if and only if t(F, U ) = t(F, W ) for every multigraph F [16,Corollary 10.36].When restricting the treewidth, however, parallel edges do make a difference, cf. Figure 1: These weighted graphs have the same tree homomorphism densities as the coarsest stable partition of the graph on the left is the trivial partition, and the graph on the right is obtained by averaging the edge weights, cf.Characterization 5 of Theorem 3.However, already the multigraph C 2 , i.e., two vertices connected by two parallel edges, distinguishes these weighted graphs, i.e., graphons that are not distinguished by oblivious 2-WL (in the sense of Theorem 4) are also not distinguished by color refinement (in the sense of Theorem 3), but the converse does not hold.Hence, while the difference between color refinement and 1-WL (corresponding to oblivious 2-WL) usually is neglected in the case of graphs, it is important to make a distinction in the more general case of graphons.Another way to phrase this is that color refinement and oblivious 2-WL are two different notions that coincide on the special case of simple graphs: if F is a multigraph and G a simple graph, then t(F, G) is unaffected if we merge parallel edges of F into single edges since they have to be mapped to the same edges of G anyway.That is, just as Theorem 1 can be recovered from Theorem 3, Theorem 2 can be recovered from Theorem 4.
Characterization 2 generalizes oblivious k-WL.First, we define the the standard Borel space M k , which again can be seen as the space of colors used by oblivious k-WL.Also in this case, its elements α = (α 0 , α 1 , α 2 , . . . ) are sequences of colors after 0, 1, 2, . . .refinement rounds.Based on the definition (3) of oblivious k-WL for graphs, we define the measurable function i k W : X k → M k mapping an x ∈ X k to a sequence (α 0 , α 1 , α 2 , . . .).In particular, α 0 corresponds to the "atomic type" of x, which also further explains why oblivious 2-WL distinguishes the weighted graphs in Figure 1: For the weighted graph on the right, α 0 always contains the edge weight of 2  3 which is nowhere to be found in the graph on the left.Hence, already the initial coloring distinguishes them.To continue, we then use i k W to define the k-WL distribution (k-WLD) ν k W as the pushforward of µ ⊗k via i k W , a probability measure on X k which again corresponds to the multiset of colors computed by oblivious k-WL.
The operator T W : L 2 (X, µ) → L 2 (X, µ) of a graphon W : X × X → [0, 1] plays an important role throughout Theorem 3, although it only becomes really apparent in the characterization via Markov operators.In Theorem 4, we replace this single operator by a whole family T k W of operators on the product space L 2 (X k , µ ⊗k ) := L 2 (X k , B ⊗k , µ ⊗k ).We define a set F k of bi-labeled graphs that serve as building blocks to construct precisely the graphs of treewidth at most k − 1, and every such bi-labeled graph F ∈ F k together with a graphon W : X × X → [0, 1] defines the graphon operator T F →W .Then, T k W := (T F →W ) F ∈F k denotes the family of all these operators.Characterization 4 states that there is a Markov operator on the product space L 2 (X k , µ ⊗k ) that "commutes" with all operators in the families T k U and T k W simultaneously.Moreover, this operator can be assumed to be permutation-invariant, i.e., reordering the k components of X k yields the same operator, an assumption that is implicitly made in the system L k iso of linear equations as its variables are indexed by sets.Permutation invariance can be left out without changing the equivalence to the other characterizations, i.e., if there is a (not necessarily permutationinvariant) Markov operator S satisfying Characterization 4, then there also is a permutation invariant one.
Characterizations 3 and 5 generalize (coarsest) stable partitions of V (G) k .For a graphon , Tinvariant for every operator T in the family T k W .In the case k = 1, this conflicts with the definition of Grebík and Rocha, but it will always be clear from the context what we mean.We show that the minimum W -invariant µ ⊗k -relatively complete sub-σ-algebra C k W of B ⊗k can be obtained by iterative applications of the operators in T k W .Then, Characterization 3 states that there is a Markov isomorphism from one quotient space to the other that "commutes" with all operators in the families of quotient operators T k W /C k W and T k U /C k U simultaneously; intuitively, for a C ∈ Θ(B ⊗k , µ ⊗k ) and an operator T on L 2 (X, µ), its quotient operator T /C on L 2 (X/C, µ/C) is defined by going from L 2 (X/C, µ/C) to L 2 (X, µ), applying T , and then going back to L 2 (X/C, µ/C).A Markov operator is called a Markov embedding if it is an isometry, and a Markov isomorphism is a surjective Markov embedding.There is a one-to-one correspondence between Markov isomorphisms and measure-preserving almost bijections, cf.[9, Theorem E.3], but for the ease of presentation, we stick to Markov isomorphisms.
Note that, in contrast to Theorem 3, there are no quotient graphons involved in Theorem 4, just quotient operators.The reason for this is that, unlike T W , the operators in the family T k W are not integral operators.For our proof, this also means that we do not have a canonical representation of a graphon W : X × X → [0, 1] as a graphon M k × M k → [0, 1] (or as multiple such graphons).Instead, we define canonical representations of the operators in T k W on the space L 2 (M k , ν k W ) by hand.In Section 2, the preliminaries, we collect some more definitions and basics we need.Section 3 introduces bi-labeled graphs and graphon operators, which are the key to our main theorem.In particular, we define the set F k of bi-labeled graphs from which we are able to construct precisely the multigraphs of treewidth k.For a graphon W , this set of bi-labeled graphs defines the family of graphon operators T k W that takes the place of the usual integral operator T W . Section 4 is the main section of this paper and closely follows Grebík and Rocha [9] in the definition of all notions in and the proof of Theorem 4. In Section 5, we show that it is also possible to define a variant of k-WL, which we call simple k-WL, that leads to a variant of Theorem 4 where the characterization by multigraph homomorphism densities is replaced by simple graph homomorphism densities.This variant of Theorem 4, however, is less elegant and has an artificial touch to it.Most of the proofs are left out as they are mostly analogous to the ones in Section 4. We draw some conclusions and discuss some open problems in Section 6.

Product Spaces
Recall that, throughout the whole paper, (X, B) denotes a standard Borel space, i.e., B is the Borel σ-algebra of a Polish space, and µ a Borel probability measure on X.We often consider the space (X k , B ⊗k , µ ⊗k ) with the product σ-algebra B ⊗k of B and the product measure µ ⊗k of µ for k ≥ 1.The product of a countable family of standard Borel spaces is again a standard Borel space [15, Section 12.B].Moreover, for a countable family of standard Borel spaces, its product σ-algebra is actually equal to the Borel σ-algebra of the product topology of the underlying Polish spaces as Polish spaces are second countable [15,Section 11.A].Hence, the product space (X k , B ⊗k ) is again a standard Borel space and B ⊗k is equal to the Borel σ-algebra of the product topology of the Polish space underlying (X, B).For simplicity, we identify the products X × X × X and (X × X) × X in the usual way.Then, also Section 18].We treat higher-order products in the same way.
We often use the Tonelli-Fubini theorem, cf.[6,Theorem 4.4.5] and also [2,Theorem 18.3], which states that, for σ-finite measure spaces (X, S, µ) and (Y, T , ν) and a non-negative function f on X × Y that is measurable for S ⊗ T , we have In particular, the functions x → Y f (x, y) dν(y) and y → X f (x, y) dµ(x) are measurable for S and T , respectively.If f is not necessarily non-negative but integrable with respect to µ × ν, then the same equations hold and the aforementioned functions are measurable on sets X ′ and Y ′ with µ(X \ X ′ ) = 0 and ν(Y \ Y ′ ) = 0, respectively.

Markov Operators
In general, for a measure space (X, S, µ) and 1 ≤ p ≤ ∞, the space L p (X, µ) := L p (X, S, µ) consists of all measurable real-valued functions on X with f p < ∞, and L p (X, µ) := L p (X, S, µ) is obtained from L p (X, µ) by identifying functions that are equal µ-almost everywhere.The space L 2 (X, µ) plays a special role among these spaces as it is a Hilbert space with the inner product given by f, g := X f g dµ.Besides L 2 (X, µ), the space L ∞ (X, µ) also plays an important role in this paper.Note that, if µ is a probability measure, then we have f 2 ≤ f ∞ and, in particular, the inclusion L ∞ (X, µ) ⊆ L 2 (X, µ).
Given two normed linear spaces (X, • ) and (Y, , then we just say that T is an operator on X.The operator norm of T is given by T := sup{|T (x)| | x ≤ 1} < ∞, and if T ≤ 1, then T is called a contraction.For probability spaces (X, S, µ) and (Y, T , ν) and an operator T : To clearly distinguish this from T being a contraction L 2 (X, µ) → L 2 (Y, ν), we sometimes use the term L 2 -contraction for this.Observe that the composition of two contractions yields a contraction, and in particular, the composition of L 2 -and L ∞ -contractions yields a L 2 -and a L ∞ -contraction, respectively.

Quotient Spaces
Recall that a sub-σ-algebra C ⊆ B of B is called µ-relatively complete if Z ∈ C for all Z ∈ B, Z 0 ∈ C with µ(Z△Z 0 ) = 0. Note that requiring Z ∈ C for every Z ∈ B with µ(Z) = 0 instead would yield an equivalent definition.The set of all µ-relatively complete sub-σ-algebras of B is denoted by Θ(B, µ) and clearly includes B itself.For a non-empty Φ ⊆ Θ(B, µ), we have Φ := C∈Φ C ∈ Θ(B, µ) [9,Claim 5.4].Hence, for a set X ⊆ B, there is a smallest µ-relatively complete sub-σ-algebra including X , which we denote by X .Note that C = {A△Z | A ∈ C, Z ∈ B with µ(Z) = 0} for a sub-σ-algebra C ⊆ B. Given C ∈ Θ(B, µ), we let L 2 (X, C, µ) ⊆ L 2 (X, µ) denote the subset of all functions that are C-measurable.It is a standard fact that, for C ∈ Θ(B, µ), the linear hull of Let k ≥ 1 and consider L 2 (X k , µ ⊗k ).Every permutation π : [k] → [k] induces a measurepreserving measurable map π : X k → X k by setting π(x 1 , . . ., x k ) := (x π(1) , . . ., x π(k) ) for all x 1 , . . ., x k ∈ X, which allows us to consider its Koopman operator T π on L 2 (X k , µ ⊗k ).Clearly, the adjoint of T π is given by T It is easy to see that this is the case if and only if π(C) ⊆ C for every permutation π : Given a measure space (X, S, µ), a measurable space (Y, T ), and a measurable function [6,Theorem 4.1.11].The following claim states the existence of quotient spaces.

Claim 6 ([9, Theorem E.1]). Let C ∈ Θ(B, µ).
There is a standard Borel space (X/C, C ′ ), a Borel probability measure µ/C on X/C, a measurable surjection q C : X → X/C, and Markov operators Claim 7 essentially states that the quotient space (X/C, C ′ ) is unique up to sets of measure zero.

Claim 7 ([9, Corollary E.2]
).Let (X, B) and (Y, D) be standard Borel spaces.Let µ be a Borel probability measure on X and f : X → Y be a measurable function.Let C ∈ Θ(B, µ) be the minimum µ-relatively complete sub-σ-algebra that makes f measurable.Then, for every Of course, this notion depends on the underlying space (X, B, µ), i.e., if we consider (X k , B ⊗k , µ ⊗k ) as the underlying space, then all these operators mentioned before are trivially permutation invariant.However, since the intended underlying space is always clear from the context, we just use the term permutation invariant.It is not hard to prove that, if C ∈ Θ(B ⊗k , µ ⊗k ) is permutation invariant, then so are S C and I C , i.e.,

Quotient Operators
For C ∈ Θ(B, µ) and an operator T : L 2 (X, µ) → L 2 (X, µ), we use the conditional expectation to define the operators respectively.These definitions reflect the same concept of a quotient operator via different languages.The following lemma states some basic properties and shows how both definitions are related.
This also immediately yields 2. For 3, we have by 6 and 4 of Claim 6 and Claim 5.For 4, we have by 4 and 6 of Claim 6.
For 5, assume that C is T -invariant.By Claim 5, the expectation The following lemma is an application of the Mean Ergodic Theorem for Hilbert spaces to Markov operators [8, Theorem 8.6, Example 13.24] and is essentially the essence of the proof of the direction " 4 =⇒ 5" of Theorem 3 by Grebík and Rocha [9].
Proof.The proof of the existence of C, D ∈ Θ(B, µ) satisfying 1 to 4 uses the Mean Ergodic Theorem and is identical to the the proof of Theorem 1.2, (4) ⇒ (5), in [9]; we leave it out here.
To prove 5, let • T 2 and that D is T 2 invariant, which proves 5b.Now, we use 3 and the T 2 -invariance of D to obtain to obtain

Graphon Operators
In this section, we present the key ingredient to Theorem 4. The key insight to go from color refinement to k-WL is, for a graphon W , to replace the operator T W on L 2 (X, µ) by a family T k W of operators on the product space L 2 (X k , µ ⊗k ).This idea is somewhat already present in the work of Grohe and Otto [11,Section 5.1], where they define a family of graphs and consider a matrix X such that X is a fractional isomorphism between all these graphs simultaneously.The graphon setting shows that the step of defining these graphs for the sake of them having the right adjacency matrix is rather artificial; the operators we define are not integral operators defined by a graphon.
The family T k W we define is closely related to oblivious k-WL and tree decompositions, or more precisely, tree-decomposed graphs.In Section 3.1, we follow the approach of [17] of using a set of bi-labeled graphs as building blocks that are then glued together to form larger graphs.From our set F k of bi-labeled graphs, we obtain precisely the multigraphs of treewidth at most k − 1.In Section 3.2, we adapt the concept of homomorphism matrices of bi-labeled graphs from [17] by defining the graphon operator of a bi-labeled graph and a graphon.The graphon operators of our building blocks then yield the family T k W .We show how this family is related to homomorphisms: on the level of bi-labeled graphs, we obtain all multigraphs of treewidth at most k − 1, while we obtain all homomorphism functions of multigraphs of treewidth at most k − 1 on the operator level.

Bi-Labeled Graphs
≥ 0 are vectors of vertices such that both the entries of a and the entries of b are pairwise distinct.When there is no fear of ambiguity, we sometimes just use the term graph to refer to a bi-labeled graph.The multigraph G is called the underlying graph of G, and the vectors a and b are called the vectors of input and output vertices, respectively.That is, a bi-labeled graph is a multigraph where additionally input and output labels are assigned to the vertices with every vertex having at most one label of each type.Note that one usually does not require that every vertex has at most one label of each type, cf.[17]., but this is needed to ensure that graphon operators are well defined; the reason is that the diagonal in the product space (X k , B ⊗k , µ ⊗k ) has measure zero (as long as our standard Borel space is atom free), a problem which one does not face in the finite-dimensional case.
Two bi-labeled graphs G = (G, a, b) and denote the set of all (isomorphism types of) bi-labeled graphs with k input and ℓ output vertices, and let G k,ℓ ⊆ M k,ℓ be the subset whose underlying graphs are simple.Let The transpose of a bi-labeled graph , where F is obtained from the disjoint union of F 1 and F 2 by identifying vertices b 1,i and a 2,i for every i ∈ [m].The Schur product of two bi-labeled graphs without output labels , where F is obtained from the disjoint union of F 1 and F 2 by identifying vertices a 1,i and a 2,i for every i ∈ [m].One usually defines the Schur product for general bilabeled graphs in M k,ℓ by also identifying output vertices, cf.[17].This, however, can result in vertices with multiple input or output labels, which we do not allow by our definition of a bi-labeled graph as remarked earlier.
Treewidth is a graph parameter that measures how "tree-like" a graph is.Too see how the concept is related to the just introduced bi-labeled graphs, let us first recall the usual definition of treewidth via tree decompositions.Formally, a tree decomposition of a multigraph G is a pair (T, β), where T is a tree and β : For every t ∈ V (T ), the set β(t) is called the bag at t.The width of the tree decomposition (T, The treewidth tw(G) of a multigraph G is the minimum of the widths of all tree decompositions of G.Note that treewidth is usually defined for simple graphs and not for multigraphs, but for us, ignoring the edge multiplicities like in the previous definition yields just the right notion for multigraphs.For the sake of completeness, note that path decompositions and pathwidth of a multigraph G can be defined analogously by only considering tree decomposition (T, β) where T is a path.
General tree decompositions are impractical to work with, and we rather use the following restricted form of a tree decomposition: a nice tree decomposition of a multigraph G is a triple (T, r, β) where (T, β) is a tree decomposition of G and r ∈ V (T ) a vertex of T , which we view as the root of T , such that Figure 3: The bi-labeled graphs I 3 2 , F 3 2 , and N 3  2 .
1. β(r) = ∅ and β(t) = ∅ for every leaf t of (T, r) and 2. every internal node s ∈ V (T ) of T is of one of the following three types: Forget node: s has exactly one child t with β(s) The width of (T, r, β) is the width of (T, β).It is well-known that every graph G has a nice tree decomposition of width tw(G).
Nice tree decompositions can be interpreted in terms of bi-labeled graphs: The vertices with input labels (and also the vertices with output labels) form a bag.An introduce node adds a fresh vertex with an input label.A forget node removes an input label from a vertex.A join node glues the input vertices of a bi-labeled graph to the input vertices of another bi-labeled graph.Hence, a join node is just the Schur product of the two bi-labeled graphs.The behavior of introduce and forget nodes corresponds to the composition with certain bi-labeled graphs, which we call introduce and forget graphs for this reason.

Definition 10 (Introduce, Forget, and Neighbor Graphs
be the set of all neighbor graphs.Neighbor graphs correspond to a forget node that is immediately followed by an introduce node for the very same label.Considering these neighbor graphs instead of individual introduce and forget graphs has the advantage that our bi-labeled graphs always have both k input and k output labels, which means that we can restrict ourselves to the space L 2 (X k , µ ⊗k ) later on.For our purposes, this is not a restriction as we can always add isolated vertices to a graph without affecting its homomorphism density in a graphon.Moreover, it is also not a restriction that the fresh vertex has to use the same label as the forgotten vertex since we may just inductively re-label the whole bi-labeled graph.By viewing bi-labeled graphs constructed from neighbor graphs by composition and the Schur product as tree decompositions, we are only halfway at our goal as we are missing a multigraph that is being decomposed.We rather have to view these bi-labeled graphs as tree-decomposed graphs, which we achieve by adding edges-but only between vertices in the same bag.Formally, we can add such an edge by the composition with an adjacency graph, a bi-labeled graph consisting just of a single edge and some isolated vertices.

Definition 11 (Adjacency Graphs
be the set of all adjacency graphs.
Having defined the set N k of neighbor graphs and the set A k of adjacency graphs, we can formalize our view of tree-decomposed graphs as terms built from these bi-labeled graphs by composition and the Schur product.For the sake of brevity, we define F k := N k ∪ A k , and for simplicity, we additionally define the all-one graph for k ≥ 1.It introduces k fresh vertices with input labels and serves as the leaves of our tree decompositions; this is much simpler than using k individual introduce graphs.Definition 12. Let k ≥ 1.For a set F ⊆ M k,k of bi-labeled graphs with k input and k output labels, let F •,• denote the smallest set of terms such that 1.
Similarly, let F • ⊆ F •,• be the smallest set of terms satisfying 1 and 2. For a term F ∈ F •,• , let [[F]] denote the bi-labeled graph obtained from evaluating it.
Note that, for a set F ⊆ M k,k and a term F ∈ F •,• , the bi-labeled graph [[F]] is well-defined as we always have [[F]] ∈ M k,0 .For the specific set F k of neighbor and adjacency graphs, a term F ∈ F k •,• is essentially a tree-decomposed graph, where the tree decomposition is rooted, the multigraph being decomposed is the bi-labeled graph underlying [[F]], and the bag at the root is given by the input vertices of [[F]].As mentioned before, in terms of nice tree decompositions, the Schur product corresponds to a join node, composition with a neighbor graph corresponds to an introduce node followed by a forget node (when viewed from the root), and the composition with an adjacency graph adds an edge to a bag.The height h(F) of a term Then, the height of F corresponds to the height of the tree of the tree decomposition when viewing F as a tree-decomposed graph.] is G with some additional isolated vertices: Note that a term fixes an ordering of the vertices of the graph, which we have to keep in mind in the following.First, pad the bag of every leaf to size k by adding k fresh isolated vertices.At an introduce node, add a forget node below that removes one of the isolated vertices.At a forget node, add an introduce node above adding a fresh isolated vertex.At a join node, re-order the vertices in one of the terms such that the original vertices of G are at the same positions in both terms and, then, identify every additional isolated vertex with the one at the same position in the other term.Lemma 13 would have been simplified if we included more graphs in F k : With individual introduce and forget graphs, we would not have to deal with isolated vertices.However, the price for this would be that we have to consider all product spaces L 2 (X 1 , µ ⊗1 ), . . ., L 2 (X k , µ ⊗k ) instead of just L 2 (X k , µ ⊗k ).Similarly, we could have included graphs in F k that allow to re-label input vertices; then we would not have to inductively re-label whole terms.But, also in this case it pays off to keep the set F k as simple as possible.Let us briefly define these permutation graphs nevertheless since they come in handy when proving that the operators and sub-σ-algebras we define are permutation invariant.Formally, for k ≥ 1 and a permutation π : [k] → [k], we define the permutation graph Moreover, for a tuple a ∈ V (F ) k of vertices of a graph F , let π(a) := (a π(1) , . . ., a π(k) ).Then, for a bi-labeled graph (F, a, b) ∈ M k,ℓ , we have

Graphon Operators
Graphon operators generalize the homomorphism density t(F, W ) of a multigraph F in a graphon W : X × X → [0, 1] to bi-labeled graphs.To this end, let F = (F, a, b) ∈ M k,ℓ be a bi-labeled graph.To simplify notation, let t(F , W ) := t(F, W ) denote the homomorphism density of the underlying graph of F in W , i.e., we ignore both the input and output labels.Now, let us first take the input labels of F into account, that is, we view F as a multi-rooted multigraph and the homomorphism density becomes a function by not fixing the vertices that have an input label.Formally, the homomorphism function of for all x a1 , . . ., x a k ∈ X.The Tonelli-Fubini theorem immediately yields that Then, when taking both input and output labels of F into account, we obtain an operator T F →W instead of a function f F →W by, intuitively, "gluing" a given function f to the output vertices of F to obtain the function T F →W f .The point of this definition is that an application of T F →W to a homomorphism function f G→W yields the homomorphism function f F •G→W .Formally, the F -operator of W is the mapping for every f ∈ L 2 (X ℓ , µ ⊗ℓ ) and all x a1 , . . ., x a k ∈ X.Note that f F →W = T F →W 1 X ℓ as an element of L ∞ (X k , µ ⊗k ) and, in particular, The Tonelli-Fubini theorem and the Cauchy-Schwarz inequality allow to verify that Equation ( 5) indeed yields a well-defined contraction.We stress that it is important that no vertex of F has multiple input or output vertices.Lemma 14.Let F ∈ M k,ℓ be a bi-labeled graph and W : X × X → [0, 1] be a graphon.Then, ): the measurability follows from the definition of the product σ-algebra and the measurability of W .Then, since W is bounded by 1 by definition, we get that it is a function in in L ∞ (X V (F ) , µ ⊗V (F ) ).More precisely, its • ∞ -norm is at most W ∞ since F does not have loops, i.e., i = j.Now, consider an f ∈ L 2 (X ℓ , µ ⊗ℓ ).Then, x → f (x b1 , . . ., x b ℓ ) is a function in L 2 (X V (F ) , µ ⊗V (F ) ): Again, the measurability of these functions follows from the definition of the product σ-algebra.Then, by the Tonelli-Fubini theorem, we get that the • 2 -norm of this function function is just f 2 , which means that it is in L 2 (X V (F ) , µ ⊗V (F ) ).Note that, at this point, it is important that the entries of b are pairwise distinct.
Define the function g on X V (F ) by for every x ∈ X V (F ) .By the previous considerations, g ∈ L 2 (X V (F ) , µ ⊗V (F ) ) with Then, the function being integrated in (5), which is obtained from g by fixing x a1 , . . ., x a k ∈ X, is also measurable (see also [2,Theorem 18.1]).By the Tonelli-Fubini theorem, we have where is defined and finite for µ ⊗a -almost all x a1 , . . ., x a k ∈ X.Hence, for µ ⊗a -almost all x a1 , . . ., x a k ∈ X, we obtain a function in L 2 (X V (F )\a , µ ⊗V (F )\a ), to which the Cauchy-Schwarz inequality is applicable, from g by fixing x a1 , . . ., x a k .Again by the Tonelli-Fubini theorem and since the entries of a are pairwise distinct, T F →W f is a measurable function defined almost everywhere, and we get Hence, T F →W f is a function in L 2 (X k , µ ⊗k ).Now, for a function f ′ ∈ L 2 (X ℓ , µ ⊗ℓ ) such that f and f ′ are equal µ ⊗ℓ -almost everywhere, define g ′ analogously to g.Then, g and g ′ are equal µ ⊗V (F ) -almost everywhere and, with the previous considerations, another application of the Cauchy-Schwarz inequality and the Tonelli-Fubini theorem yields that Verifying the linearity of T F →W is straight-forward, and as seen before, we have i.e., T F →W is bounded since F and W are fixed.
From the previous considerations, we may even assume that g is bounded by Note that the definition of T F →W only depends on the isomorphism type of F , i.e., isomorphic bi-labeled graphs F and F ′ define the same operator T F →W = T F ′ →W .Moreover, if F does not have any edges, then the definition of T F →W is independent of W and we just write T F .We just have to be a bit careful since T F is still dependent on the standard Borel space (X, B) and the Borel probability measure µ.

Let k ≥ 1 and π : [k] → [k] be a permutation. Then, T Pπ is equal to the Koopman operator
T π of the measure-preserving measurable map X k → X k induced by π.
The operator T F →W was defined such that the application to a homomorphism function f G→W yields the homomorphism function f F •G→W .The following lemma formalizes this by stating that the composition of bi-labeled graphs corresponds to the composition of graphon operators.Moreover, the analogous correspondence holds between the transpose and the Hilbert adjoint and between the Schur product and the point-wise product.
Proof.1: We have ) by the Tonelli-Fubini theorem, which is applicable since the product being integrated is a function in L 1 (X V (F ) , µ ⊗V (F ) ) by the Cauchy-Schwarz inequality.
Moreover, the homomorphism density of F in T is defined as t(F, T) := 1 X , f F→T .
As remarked above, given a term F ∈ F k , we can use the correspondence of bi-labeled graph operations to their operator counterparts to inductively compute the homomorphism function f [[F]]→W and, in particular, the homomorphism density t by Definition 17, the induction hypothesis, the definition of T [[F]]→W , and Lemma 16 3.For the second case of the inductive step by Definition 17, the induction hypothesis, the definition of T [[F]]→W , and Lemma 16 4.
As remarked above, an essential ingredient of the proof of Theorem 4 is the definition of families of L ∞ -contractions that replace T k W but still yield the same homomorphism functions.The following lemma gives a sufficient condition under which this is possible.Recall that a Markov embedding is a Markov operator that is an isometry.Unlike Markov operators in general, Markov embeddings are compatible with point-wise products of functions, cf.[8, Theorem 13.9, Remark 13.10].This is crucial since we need the point-wise product of functions to get from bounded pathwidth to bounded treewidth homomorphism functions.
) and (X 2 , B 2 ) be standard Borel spaces with Borel probability measures µ 1 and µ 2 on X 1 and X 2 , respectively.Let T 1 and T 2 be families of L ∞ -contractions on L 2 (X 1 , µ 1 ) and L 2 (X 2 , µ 2 ), respectively, indexed by For the induction basis F = 1 k , we have For F = F • F ′ , where F ∈ F k , we have by the assumption and the induction hypothesis.Finally, for F = F 1 • F 2 , we use that I is a Markov embedding and, hence, satisfies .9].We have by the induction hypothesis.
An important application of Lemma 19 is to replace the family T k W by the quotient operators T k W /C for an appropriate C ∈ Θ(B ⊗k , µ ⊗k ).To this end, we call a C ∈ Θ(B ⊗k , µ ⊗k ) W -invariant if C is invariant for every operator in the family Proof.The last equation is just Lemma 18.By Lemma 8 4 and 5, we have , where I C is a Markov embedding by Claim 6 5, Therefore, Lemma 19 yields the first two equations.

Weisfeiler-Leman and Graphons
In Section 4.1 to Section 4.5 we closely follow Grebík and Rocha [9] to prove Theorem 4 and formally define all notions appearing in it.Many, but not all, of their proofs transfer without too many changes.In Section 4.1, we start off by showing that the minimum W -invariant µ ⊗krelatively complete sub-σ-algebra C k W of B ⊗k for a graphon W can be obtained by iterative applications of the operators T k W . Section 4.2 defines define the space M k , i.e., the space of all colors used by oblivious k-WL, and k-WL distributions, which generalize multisets of colors.In Section 4.3, we define the function owl k W : X k → M k and the k-WL distribution ν k W for a graphon W .In Section 4.4, we deviate from Grebík and Rocha [9]: They show that every distribution on iterative degree measures ν defines a graphon on the space M; this graphon for ν W is then isomorphic to the quotient graphon W/C W . Since the operators in T k W are not integral operators, we take the different route of showing that a k-WL distribution ν defines a family of operators T ν on L 2 (M k , ν); the family T ν k W then corresponds to T k W .These operators are essential in the proof of Theorem 4 in Section 4.5.
Section 4.6 shows that one can combine all k-WL distributions ν 1 W , ν 2 W , . . . of a graphon W into a single distribution to obtain a new characterization of weak isomorphism.Section 4.7 further explains how the characterization of Theorem 4 using Markov operators corresponds to the system L k iso of linear equations.

The Minimum W -Invariant Sub-σ-Algebra
For a family T = (T i ) i∈I of operators T i : L 2 (X, µ) → L 2 (X, µ), where i ∈ I, and a C ∈ Θ(B, µ), define Verifying that C k W is in fact the minimum W -invariant µ ⊗k -relatively complete sub-σ-algebra of B ⊗k is mostly analogous to [9, Proposition 5.13].A difference is given by the operators in T A k →W , which are multiplicative, which implies that a single initial application guarantees T A k →W -invariance for all subsequent sub-σ-algebras in the sequence.Moreover, we also verify that 5. C k W is the minimum W -invariant µ ⊗k -relatively complete sub-σ-algebra of B ⊗k , and 6.C k W,n is permutation invariant for every n ∈ N ∪ {∞}.
Proof. 1 and 2: Let C denote the minimum T A k →W -invariant µ ⊗k -relatively complete sub-σalgebra of B ⊗k and D denote the µ ⊗k -relatively complete sub-σ-algebra of B ⊗k from 1.We prove that We have established C = D and it remains to prove that these are also equal to C k W,0 .We have ∅, X k ⊆ C and, hence, 3: Let D denote the µ ⊗k -relatively complete sub-σ-algebra of B ⊗k from 3, i.e., D is the minimum µ ⊗k -relatively complete sub-σ-algebra of B ⊗k that contains C k W,n and makes the maps Then, the claim follows as T A→W is multiplicative, cf. the proof of 1 and 2.
5: We first show that To this end, note that n∈N C k W,n is an algebra and the σalgebra generated by it is C k W . Hence, from [6, Theorem 3.1.10],it easily follows that we can approximate every set in C k W by a set in n∈N C k W,n w.r.t. the measure of their symmetric difference.This implies that, for every A ∈ C k W , there is a sequence , linearity and continuity of T N then yields that L 2 (X k , C k W , µ ⊗k ) is T N -invariant.6: First, recall that B ⊗k is permutation invariant.Moreover, if C ∈ Θ(B ⊗k , µ ⊗k ), then π(C) ∈ Θ(B ⊗k , µ ⊗k ) for every permutation π : [k] → [k].This implies that, if X ⊆ B ⊗k is a set with π(X ) ⊆ X for every permutation π : [k] → [k], then X is permutation invariant.Hence, ∅, X k is permutation invariant, and it suffices to show that, for a permutationinvariant C ∈ Θ(B ⊗k , µ ⊗k ), both T A k →W (C) and T N k (C) are permutation-invariant.Then, induction yields that C k W,n is permutation invariant for every n ∈ N and, hence, also It remains to show that, for a permutation-invariant C ∈ Θ(B ⊗k , µ ⊗k ), both T A k →W (C) and T N k (C) are permutation-invariant.We prove the statement for T A k →W (C); the proof for T N k (C) is analogous.To this end, we show that, for an arbitrary C ∈ Θ(B ⊗k , µ ⊗k ), we have for every permutation π : To prove Equation ( 6), let π : [k] → [k] be a permutation and observe that As a side note, the analogous observation for Hence, for D ∈ Θ(B ⊗k , µ ⊗k ), we have

Weisfeiler-Leman Measures and Distributions
Before defining the mapping owl k W : X k → M k , we have to define the space M k , which can be seen as the space of all colors used by oblivious k-WL.To this end, we have to state some facts regarding spaces of measures first.For a separable metrizable space (X, T ), let P(X) denote the set of all Borel probability measures on X.Let C b (X) denote the set of bounded continuous realvalued functions on X.We endow P(X) with the topology generated by the maps µ → f dµ for f ∈ C b (X).Then, for (µ i ) i∈N with µ i ∈ P(X) and µ ∈ P(X), the Portmanteau theorem states that the following three are equivalent [15,Theorem 17.20]:

f dµ
Here, U d (X) denotes the set of bounded d-uniformly continuous real-valued functions on X and may clearly be replaced by some uniformly dense subset.If (X, T ) is compact, which is the case for the spaces we define, then U d (X) = C b (X) = C(X), where C(X) denotes the set of continuous real-valued functions on X.The Borel σ-algebra B(P(X)) is then generated by the maps µ → µ(A) for A ∈ B(X) and also by the maps µ → f dµ for bounded Borel real-valued functions f [15,Theorem 17.24].If (X, T ) is Polish, then so is P (X) [15,Theorem 17.23], which means that (P(X), B(P(X))) is again a standard Borel space for a standard Borel space (X, B).
It is a standard fact that a compact metrizable space K = (X, T ) is separable [15,Proposition 4.6].Hence, if we let B be denote the Borel σ-algebra generated by T , then (X, B) is a standard Borel space.The topological space P(X) is again compact metrizable [15,Theorem 17.22].
We are ready to define the space M k .One should pay attention to the connection to oblivious k-WL, cf.Section 1.2: Here, 2 ) is the space of possible "edge weights" of a tuple x ∈ X k , generalizing possible atomic types.Moreover, oblivious k-WL defines k multisets of colors in every refinement, which results in k probability measures on the previous space M k n in the following definition.

Definition 23 (The Spaces M k and P
2 ) and inductively define M k n := i≤n P k i and n be the natural projection.Finally, define As a product of a sequence of metrizable compact spaces, M k is metrizable [6, Proposition 2.4.4] and also compact by Tychonoff's Theorem [6,Theorem 2.2.8].Moreover, as M k is a product of a sequence of second-countable spaces, the Borel σ-algebra of M k and the product of the Borel σ-algebras of its factors are the same, cf.Section 2.1.
Note that the definition of P k , i.e., P k is well-defined.This condition expresses that α n+2 ∈ P k , which can be thought of as a coloring after n + 2 refinement rounds, is consistent with α n+1 for every n ∈ N, but it does not require that α 0 is consistent with α 1 .One could add the additional consistency condition that, for ij ∈ [k]  2 and u / ∈ ij, the push-forward of (α 1 ) u via the projection to component ij is the Dirac measure of (α 0 ) ij , but this would introduce an inconsistency in the case k = 2 where there is no such u.For simplicity, we just leave this out; it does not cause any problems for us.
In terms of graphs, an element (α 0 , α 1 , . . . ) of M k can be thought of as a sequence of unfoldings of a graph, cf.[5], of heights 0, 1, 2, . . . .These unfoldings, however, do not have to be related in any way.The subspace P k contains these sequences where each unfolding is a continuation of the previous one.These sequences can also be viewed as a single, infinite unfolding: By the Kolmogorov Consistency Theorem [15,Exercise 17.16], for all α ∈ P k and j ∈ [k], there is a unique measure µ α j ∈ P(M k ) such that (p ∞,n ) * µ α j = (α n+1 ) j for every n ∈ N.Moreover, one can verify that this mapping α → µ α j is continuous, cf.[9, Claim 6.2].
Proof.To prove that P k is closed, let α i → α with α i ∈ P k for every i ∈ N and α ∈ M k .Let j ∈ [k] and n ∈ N. By definition of the product topology, we have ((α i ) n+2 ) j → (α n+2 ) j , which yields Let α i → α with α i ∈ P k for every i ∈ N and α ∈ P k .To prove that µ αi j → µ α j , we observe that for every n ∈ N and every f ∈ C(M k n ).This already proves the claim as the set n∈N C(M k n ) • p ∞,n is uniformly dense in C(M k ) by the Stone-Weierstrass theorem [6,Theorem 2.4.11]; in particular, this set separates points by the definition of the product topology and the fact that every metrizable space is completely Hausdorff.
Lemma 24 implies that P k ∈ B(M k ) and that P k → R, α → f dµ α j is measurable for every bounded measurable real-valued function f on M k and every j ∈ [k], cf. the definition of P(M k ).This justifies the following definition of a k-WL distribution, which intuitively generalizes the concept of a multiset of colors with the additional constraints that, first, that the non-consistent sequences α ∈ M k have measure zero and, second, it satisfies a variant of the Tonelli-Fubini theorem w.r.t. the measures given by the mappings

The Mapping owl k W
Having defined the compact metrizable space M k , we can finally define the mapping owl k W : X k → M k and the k-WL distribution ν k W for a graphon W .To this end, let us first recall that oblivious k-WL for a graph G initially colors a k-tuple v ∈ V (G) k by its atomic type, which includes the information of which vertices in v are equal and which are connected by an edge.In our case, this becomes somewhat simpler since we do deal with the case that entries of a k-tuple x ∈ X k are equal; if our standard Borel space is atom free, such diagonal sets have measure zero in the product space and do not matter.Hence, we only include the information W (x i , x j ) for every ij ∈ [k]  2 .Notice the connection to the operators T A k →W : by definition, we have Let us also take a look at the substitution operation in the refinement rounds of oblivious k-WL.Fix x ∈ X k and j ∈ [k] in the following.Define x[/j] := (x 1 , . . ., x j−1 , x j+1 , . . ., x k ) ∈ X k−1 to be the tuple obtained from x by removing the jth component, and for y ∈ X, also x[y/j] := (x 1 , . . ., x j−1 , y, x j+1 , . . ., x k ) ∈ X k , which is the tuple obtained from x by replacing the jth component by y.The preimage of a set A ⊆ X k under the map x[•/j] : which we call the section of A determined by x[/j].Note that, technically, A x[/j] also depends on j and not only on the (k − 1)-tuple x[/j] ∈ X k−1 , but we nevertheless stick to this notation.The mapping x[•/j] is measurable, i.e., we have A x[/j] ∈ B for every A ∈ B ⊗k [2, Theorem 18.1 (i)].If we let p j : X k → X denote the projection to the jth component, which is measurable by definition of B ⊗k , then, the mapping ] is measurable as the composition of measurable functions and we have To see the connection to the operators T N k , note that the definition of T N k j yields that for every f ∈ L 2 (X k , µ ⊗k ) and µ ⊗k -almost every x ∈ X k .

Definition 26 (The Mapping owl
An immediate consequence of Definition 26, which we often use, is that owl k W,m In particular, we use it to prove that the mapping owl k W,n is measurable for every n ∈ N ∪ {∞}, which actually is needed for everything in Definition 26 to be well defined.Lemma 27 states not only that owl k W,n is measurable but also that the minimum µ ⊗k -relatively complete sub-σ-algebra that makes it measurable is given by for every n ∈ N by induction on n.For the induction basis n = 0, we have The Borel σ-algebra ) is generated by the sets of the form ij∈( 2 ) ) by a generating set in the definition of D 0 , which yields that (Lemma 22 and Lemma 22 2) For the inductive step, let n ∈ N. We have to prove that k by definition and that the Borel σ-algebra Theorem 17.24].Hence, by definition of the product σ-algebra and since it suffices to check measurability of a function for a generating set [6, Theorem 4.
Again by [6,Theorem 4.1.6],this means that D n+1 is the smallest µ ⊗k -relatively complete sub-σ-algebra of B ⊗k containing and making the maps measurable, where the equalities hold µ ⊗k -almost everywhere, cf. also Equation (7).
To see that D n+1 ⊆ C k W,n+1 , we verify that C k W,n+1 contains the aforementioned sets and that the aforementioned maps are measurable for it.We have By the induction hypothesis, owl k W,n is C k W,n -measurable, and since -measurable, which is just what we wanted to prove.
It remains to verify that C k W,n+1 ⊆ D n+1 .By Lemma 22 3, it suffices to prove that D n+1 contains C k W,n and makes the functions . By the induction hypothesis, we have A ∈ D n .Since the preimage of a σalgebra is a σ-algebra, we have , where the equality holds µ ⊗k -almost everywhere.Let j ∈ [k].We know that D n+1 makes the map where, by definition, we have It is easy to see that the Borel σ-algebra B(M k ) is generated by the projections p ∞,n .Hence, by [6,Theorem 4.1.6], By Lemma 27, C k W is the minimum µ ⊗k -relatively complete sub-σ-algebra that makes owl k W measurable. Hence owl k W : X k → M k is a measurable and measure-preserving mapping from the measure space (X k , B ⊗k , µ ⊗k ) to (M k , B(M k ), ν k W ) and we can consider the Koopman operator In addition, the operator The following lemma can also be seen as a justification of the definition of a k-WLD.In particular, it shows that Tonelli-Fubini-like requirement in Definition 25 actually stems from the Tonelli-Fubini theorem.In other words, the definition of a k-WLD is chosen such that it captures the essential properties of ν k W that make it possible to define the analogue of T k W on the space L 2 (M k , ν k W ). In the next section, we define these operators on L 2 (M k , ν) for an arbitrary k-WLD ν.

2: Let
be bounded and measurable.We have

Operators and Weisfeiler-Leman Measures
For a graphon W , the operator . However, we still lack that this Markov isomorphism "maps" the family To close this gap, we show that we can define a family This replaces the graphon M × M → [0, 1] defined by Grebík and Rocha [9].Let us begin with operators for neighbor graphs as this is the interesting case; in particular, it shows why we have the Tonelli-Fubini-like requirement in the definition of a k-WLD.
Proof.We show that the definition yields a well-defined contraction The definition of a k-WLD immediately yields that, if A ∈ B(M k ) with ν(A) = 0, then µ α j (A) = 0 for ν-almost every α ∈ M k .Hence, if a property holds ν-almost everywhere, it holds µ α j -almost everywhere for ν-almost almost everywhere, and hence, |f | ≤ f ∞ holds µ α j -almost everywhere for ν-almost every α ∈ M k .Thus, for ν-almost every α ∈ M k , we have that is, T N k j →ν f and T N k j →ν g are equal ν-almost everywhere.Here we used that the mapping T N k j →ν is linear, which follows directly from the linearity of the integral.Recall that P k → R, α → f µ α j is measurable for every bounded measurable R-valued function f on M k by Lemma 24 and the definition of P(P k ).Since P k ∈ B(M k ) by Lemma 24 and ν(P k ) = 1, this combined with the previous considerations yields that We have Note that we again used that |f | ≤ f ∞ holds µ α j -almost everywhere for ν-almost every α ∈ M k in order to apply the Cauchy-Schwarz inequality.
The following lemma states that Lemma 30 is indeed the right definition.
. This already proves the claim as (Lemma 8 5 and Lemma 22 5) 3: We have Defining the operators for adjacency graphs is much simpler.Intuitively, every α ∈ M k contains the values W (x i , x j ) for every ij ∈ [k]  2 at position 0.
Lemma 32.Let k ≥ 1, and let Proof.The mapping α → (α 0 ) ij is measurable by definition of the product σ-algebra Hence, ) is measurable as the product of measurable functions.Moreover, by definition of M k , the function α → (α 0 ) ij is bounded by 1, which immediately yields that is linear as a multiplicative operator.Analogously to Lemma 31, one can verify that Lemma 32 is in fact the right definition.
Lemma 33.Let k ≥ 1 and W : X × X → [0, 1] be a graphon.For every We have for µ ⊗k -almost every x ∈ X k and every f ∈ L 2 (M k , ν k W ). 2 and 3: Analogous to the proof of 2 and 3 of Lemma 31, respectively.
For a k-WLD ν ∈ P(M k ), define the family of L ∞ -contractions T ν := (T F →ν ) F ∈F k .Lemma 31 3 and Lemma 33 3 can then be rephrased as the following corollary.
Corollary 34.Let k ≥ 1 and W : X ×X → [0, 1] be a graphon.Then, Recall Definition 17, i.e., the homomorphism density of a term in a family of L ∞ -contractions.In particular, this definition applies to the family T ν k W of the k-WLD ν k W of a graphon W . Lemma 19 with the previous corollary yields that T ν k W and T k W /C k W give us the same homomorphism densities (and also functions), which are just the original homomorphism densities in W .
Proof.By Corollary 20, we have t(F, , where R k W is a Markov isomorphism by Corollary 28.Then, Lemma 19 yields t(F, A permutation π : [k] → [k] extends to a measurable bijection π : M k → M k as follows: We obtain a measurable bijection π : 2 ) .From there on, π inductively extends to a measurable bijection π : M k n → M k n by component-wise application and, then, to a measurable bijection π : P k n+1 → P k n+1 by setting π((µ j ) j∈[k] ) = (π * µ π(j) ) j∈ [k] for every (µ j ) j∈[k] ∈ P k n+1 .Finally, we obtain the measurable bijection π : in which case we can view the Koopman operator of π as an operator T π→ν : L 2 (M k , ν) → L 2 (M k , ν).The notation T π→ν avoids confusion with the Koopman operator of π when viewing it as a map X k → X k , which we denote just by T π .If we call a k-WLD ν ∈ P(M k ) permutation-invariant if it is π-invariant for every permutation π : [k] → [k], then Lemma 36 yields that the k-WLD ν k W of a graphon W is permutation invariant.
Lemma 36.Let k ≥ 1 and W : X ×X → [0, 1] be a graphon.For every permutation π : ) n for every x ∈ X k by induction on n ∈ N, which then implies the claim.For the base case, we have for every x ∈ X k .For the inductive step, the induction hypothesis yields (π(owl k W,n+1 (x))) i = (owl k W,n+1 (π(x))) i for every x ∈ X k and every i ≤ n.Moreover, we have for every x ∈ X k .

Homomorphism Functions and Weisfeiler-Leman Measures
For the proof of Theorem 4, Corollary 35 allows us to get from k-WLDs to homomorphism densities, but getting to the other characterizations from there is arguably the most involved part of the proof.As Grebík and Rocha have shown [9], the key tool needed for this is the Stone-Weierstrass theorem: It yields that the set of homomorphism functions on M k , which is yet to be defined, is dense in the set C(M k ) of continuous functions on M k .Then, the Portmanteau theorem implies that equal homomorphism densities already imply equal k-WLDs.
To apply the Stone-Weierstrass theorem, we have to define the homomorphism function of a term on the set M k .Recall that an α ∈ M k is a sequence α = (α 0 , α 1 , α 2 , . . . ) that, intuitively, corresponds to a sequence of unfoldings of heights 0, 1, 2, . . . of a graphon.However, as the components α 0 , α 1 , α 2 do not have to be consistent, cf. the definition of P k , using different components may lead to different functions.Hence, we define a whole set of functions for a single term by considering all ways in which we may use the components to define a homomorphism function.We could avoid this by defining homomorphism functions just on P k instead of M k ; this, however, would complicate things further down the road, which is why we just accept this small inconvenience.Note the similarity between the following definition and the operators defined in the previous section.
Definition 37. Let k ≥ 1.For every term F ∈ F k •,• and every n ∈ N with n ≥ h(F), we inductively define the set F F n of functions M k n → [0, 1] as the smallest set such that Moreover, for every term With a simple induction, one can verify that for every term F ∈ F k •,• and every n ∈ N∪{∞} with n ≥ h(F), the set F F n is non-empty and all functions in it are well-defined and continuous.Recall that, for a term F ∈ F k •,• and a k-WLD ν ∈ P(M k ), the operators T ν already define the homomorphism function f F→Tν ∈ L ∞ (M k , ν) by Definition 17.Note that the k-WLD ν satisfying ν(P k ) = 1 is the reason why we only have this single function f F→Tν .Then, it should come at no surprise that this single function is equal to all of the previous defined functions ν-almost everywhere.
Lemma 38.Let k ≥ 1 and ν ∈ P(M k ) be a k-WLD.Let F ∈ F k •,• be a term and n ∈ N with n ≥ h(F).Then, every function in F F n • p ∞,n is equal to f F→Tν ν-almost everywhere.
Proof.We prove the statement by induction on F and n.For the base case, we have n , where we have Since ν is a k-WLD, we have ν(P k ) = 1, which yields that for ν-almost every α ∈ M k .For the product almost everywhere by the inductive hypothesis.
Corollary 35 yields the following corollary to the previous lemma.
Corollary 39.Let k ≥ 1 and W : X × X → [0, 1] be a graphon.For every term F ∈ F k •,• and every function f ∈ F F , we have For every n ∈ N ∪ {∞}, define By induction, we can use the Stone-Weierstrass theorem and the Portmanteau theorem to show that the Stone-Weierstrass is actually applicable to all of these sets and, in particular, to T k , cf.Proof.First, consider the case that n ∈ N. We trivially have We prove that T k n separates points of M k n by induction on n.For the base case n = 0, let β = γ ∈ M k 0 .Then, there is an ij ∈ For the inductive step, assume that T k n separates points of For the remaining case, assume that β n+1 = γ n+1 .Then, there is an ij ∈ n+1 is a function that separates β and γ.Having proven the statement for every n ∈ N, one can also easily see that it holds in the case n = ∞ from the definitions, cf. also the first case of the induction.

Measure Hierarchies
Theorem 4 implies that the sequence ν 1 W , ν 2 W , . . . of k-WLDs of a graphon W characterizes W up to weak isomorphism since every graph has some finite treewidth.Let us explore this a bit more in depth by combining all these k-WLDs into a single measure.
First, for ∞ > k ≥ ℓ ≥ 1, let p k,ℓ denote the projection from M k to M ℓ defined as follows: Inductively, define p k,ℓ : P k n → P ℓ n , which also directly extends to p k,ℓ : M k n → M ℓ n by applying the function component-wise.For n = 0, let p k,ℓ : P k 0 → P ℓ 0 be defined by p k,ℓ (( 2 ) .For the inductive step, p k,ℓ : . It is not hard to see that this is well-defined as every p k,ℓ is continuous.Finally, again by applying the function component-wise, p k,ℓ extends to a continuous function p k,ℓ : M k → M ℓ .Then, consider the inverse limit of the spaces M k and the projections p k+1,k for k ≥ 1 defined by where WL k denotes the set of all k-WLDs.Then, by the Kolmogorov Consistency Theorem [15, Exercise 17.16], for every ν ∈ WL, there is a unique ν ) by the Stone-Weierstrass theorem [6, Theorem 2.4.11],cf. also the proof of Lemma 24.Hence, we have One can show that, for every graphon W : X × X → [0, 1], the sequence (ν k W ) k≥1 of its k-WLDs is in WL and, hence, yields a measure ν ∞ W ∈ P(M ∞ ).Together, Lemma 41 and Lemma 43 imply that these measures induce the same topology on the space of graphons as multigraph homomorphism densities; note that this topology is different from the one induced by simple graph homomorphism densities, cf.Corollary 44.Let (W n ) n and W : X × X → [0, 1] be a sequence of graphons and a graphon, respectively.Then, the following are equivalent: Instead of permutation-invariant operators on all spaces L 2 (X 1 , µ ⊗1 ), . . ., L 2 (X k , µ ⊗k ), we only have a single permutation-invariant Markov operator S on L 2 (X k , µ ⊗k ).For an operator S on L 2 (X k , µ ⊗k ), defining . It is easy to see that (S↓ ) * = S * ↓ since the adjoint of a forget graph is the corresponding introduce graph and vice versa.Moreover, as long as S is permutation-invariant, this definition is independent of the specific pair of forget and introduce graphs, i.e., we have Lemma 46.Let k ≥ 1 and S be a permutation-invariant Markov operator on L 2 (X k , µ ⊗k ).Then, S↓ is a permutation-invariant Markov operator.Moreover, if where the last equality holds since µ is a probability measure.Since S * is also a Markov operator, we also obtain achieve this, we have to close the set of terms under Schur product, which may also introduce parallel edges if we have edges between input vertices, cf. Figure 7.To prevent this we have to prevent edges from being added between input vertices in the first place.In the following, we show how Theorem 4 and its proof has to be adapted for simple graph homomorphism densities.
In particular, what we refer to as simple (oblivious) k-WL is introduced.Not surprisingly, the definitions become more similar to color refinement and the ones of Grebík and Rocha [9].Only proofs that significantly differ from their counterpart in Section 4 are included.At the end of this section, we also briefly show how simple non-oblivious k-WL can be defined.
To prevent edges from being added between input vertices, we only allow certain combinations of adjacency and neighbor graphs; after a bunch of adjacency graphs connecting a vertex j to other vertices, we immediately follow up with a j-neighbor graph.Formally, for every (j, V ) in the set Then, let F sk := S k j,V | (j, V ) ∈ S k ⊆ G k,k be the set of all these bi-labeled graphs.We have to be a bit cautious as, in general, these graphs are not symmetric and, hence, their graphon operators are not self-adjoint; in general, the set F sk is not even closed under transposition.Note that, by definition, the S k j,V -graphon operator of a graphon W is given by for µ ⊗k -almost every x ∈ X k .Analogously to Lemma 13, one can observe that the underlying graphs of [[F]] for terms F ∈ F sk •,• are, again up to isolated vertices, precisely the simple graphs of treewidth at most k − 1. Basically, when constructing a term from a nice tree decomposition, we just add all missing edges when a vertex is forgotten.This way, every edge is added the graph as the bag at the root node of a nice tree decomposition is the empty set.
For the sake of brevity, we write -invariant.In fact, it is easy to see that C sk W is also the minimum simply W -invariant µ ⊗k -relatively complete sub-σ-algebra of B ⊗k .
For a separable metrizable space (X, T ), let M ≤1 (X) denote the set of all measures of total mass at most 1.We endow M ≤1 (X) with a topology analogously to P(X), i.e., with the topology generated by the maps µ → f dµ for f ∈ C b (X).Then, for measures that all have the same total mass, the Portmanteau theorem is still applicable as we can scale them to have total mass of one.Let P sk 0 := {1} be the one-point space and inductively define By the Kolmogorov Consistency Theorem [15,Exercise 17.16], for all α ∈ P k and (j, V ) ∈ S k , there is a unique measure µ α (j,V ) ∈ P(M k ) such that (p ∞,n ) * µ α (j,V ) = (α n+1 ) (j,V ) for every n ∈ N. Analogously to Lemma 24, the set P sk is closed in M sk and, for every (j, V ) ∈ S k , the mapping P sk → P(M sk ), α → µ α (j,V ) is continuous.To adapt the definition of k-WLD, we add a third requirement of absolute continuity and Radon-Nikodym derivatives, cf. the definition of distributions over iterated degree measures [9].
Proof.Let (j, V ) ∈ S k such that S = S k j,V .For x ∈ X k , let C x denote the minimum µ-relatively complete sub-σ-algebra that makes owl sk W •x[•/j] measurable.As seen in the proof of Lemma 48, we have = (T S→W • T owl sk W f )(x) for every f ∈ L ∞ (M sk , ν) and µ ⊗k -almost every x ∈ X k .As L ∞ (M sk , ν sk W ) is dense in L 2 (M sk , ν sk W ), this implies 1.From there on, 2 and 3 are analogous to Lemma 31 2 and 3, respectively.
For k ≥ 1 and a simple k-WL distribution ν ∈ P(M sk ), let T ν := (T S→ν ) S∈F sk .Then, for a graphon W : X × X → [0, 1], we have where the first equation is just Lemma 49 and the second equation follows from the first since R sk is a Markov isomorphism.As before, a permutation π : [k] → [k] naturally extends to a measurable bijection π : M sk → M sk , and the π-invariance, and more general the permutation invariance, of a simple k-WLD can be defined analogously to Section 4.4.The analogous result to Lemma 36 holds as well; in particular, ν sk W is permutation invariant for a graphon W .Let C ∈ Θ(B ⊗k , µ ⊗k ) be simply W -invariant; recall that this definition is a bit quirky as it means that C is (T sk W ) C sk W -invariant. Corollary 20 can then be adapted to the also somewhat quirky statement, that t(F, T Here, one has to observe that the all-one function distinguishes two measures if their total mass is different, which means that the Portmanteau theorem is still applicable in this case.From there, we obtain the following analogue to Lemma 41. Lemma 50.Let k ≥ 1.Let (W n ) n and W : X × X → [0, 1] be a sequence of graphons and a graphon, respectively.Then, ν sk Wn → ν sk W if and only if t(F, W n ) → t(F, W ) for every simple graph F of treewidth at most k − 1.
Since P(M sk ) is Hausdorff, this also means that the simple k-WLDs of two graphons are equal if and only if their treewidth k − 1 simple graph homomorphism densities are.With the Counting Lemma [16,Lemma 10.23], we also obtain the following additional corollary, which does not hold for k-WLDs as the Counting Lemma does not hold for multigraphs.
Corollary 51.Let k ≥ 1.The mapping W 0 → P(M sk ), W → ν sk W is continuous when W 0 is endowed with the cut distance.
Having outlined the necessary changes for simple graphs, we obtain the following variant of Theorem 4 for simple graph homomorphism densities.Note the quirky characterization via Markov operators, which is quite artificial in this case; this again stems from the fact that the family T sk W of operators is not closed under taking adjoints.

Conclusions
We have shown how oblivious k-WL and the work of Grebík and Rocha [9] can be married, or in other words, how oblivious k-WL and some of its characterizations generalize to graphons.In particular, we obtained that oblivious k-WL characterizes graphons in terms of their homomorphism densities from multigraphs of treewidth at most k − 1.This was made possible by using a special set of bi-labeled graphs as building blocks for the multigraphs of treewidth k − 1 and considering the graphon operators these bi-labeled graphs.Additionally, we have shown how oblivious k-WL can be modified to obtain a characterization via simple graphs: simple oblivious k-WL corresponds to homomorphism densities from simple graphs of treewidth at most k − 1.
However, the characterizations obtained this way are less elegant as the set of bi-labeled graphs one uses as building blocks is not closed under transposition, i.e., the corresponding family of operators is not closed under taking Hilbert adjoints.
The original goal of this work was to define a k-WL distance of graphons and to prove that it yields the same topology as treewidth k homomorphism densities, cf.[3], where the result of Grebík and Rocha is used to prove such a result for the tree distance.However, this does not work out as hoped since multigraph homomorphism densities define a topology different from the one obtained by the cut distance, cf.[16,Exercise 10.26] or [14,Lemma C.2].Moreover, the quirky characterization of simple k-WL via Markov operators, which stems from the non-symmetric bi-labeled graphs used as building blocks, is also not well-suited to define such a distance.Hence, it remains an open problem to define such a distance.

Figure 1 :
Figure 1: Two fractionally isomorphic weighted graphs that are distinguished by oblivious 2-WL.

Definition 17 .
Lemma 14.When handling such families of operators, we often use notation like T F →W • T for an L ∞ -contraction T or T F →W /C for a C ∈ Θ(B ⊗k , µ ⊗k ) to denote the family obtained by applying the operation to every operator in the family; for these examples, we obtain the families (T F →W • T ) F ∈F and (T F →W /C) F ∈F .Moreover, if the graphs in F do not have any edges, we again abbreviate T F := (T F ) F ∈F .Recall that F k is the set of all neighbor and adjacency graphs with k input and output labels.Let us finally define the family T k W := T F k →W , that replaces the single operator T W in Theorem 4, our characterization of oblivious k-WL.Let us explore the connection between the family T k W and treewidth k−1 homomorphism functions: Recall that the terms in F k correspond to the tree-decomposed multigraphs of treewidth at most k − 1 by Lemma 13.Given such a term F ∈ F k , we can use the correspondence of bilabeled graph operations to their operator counterparts, cf.Lemma 16, to inductively compute the homomorphism function f[[F]]→W of [[F]] in a graphon W using the operators T k W . Hence, the operators in T k W yield all homomorphism functions of multigraphs of treewidth at most k − 1 in W .An important part of the proof of Theorem 4 consists of defining different families of L ∞ -contractions indexed by F k that we may use instead of T k W and still yield the same homomorphism functions.For example, we may replace T k W by the quotient operators T k W /C for an appropriate C ∈ Θ(B ⊗k , µ ⊗k ).This leads to the following definition.Let k ≥ 1 and T D, µ) for every i ∈ I .Then, T(C) ∈ Θ(B, µ), cf.Section 2.3, and C is called T-invariant if T(C) ⊆ C, which is equivalent to requiring that C is T i -invariant for every i ∈ I.Note that this operation is monotonous, i.e., for all C, D ∈ Θ(B, µ) with C ⊆ D, we have T(C) ⊆ T(D).By definition, the family T k W consists of the two families T A k →W and T N k .The following definition uses these two individual families to define the sub-σ-algebra C k W of B ⊗k .Already at this point, one should notice the connection to oblivious k-WL, cf.Section 1.2: the operators in T A k →W capture the concept of atomic types while the operators in T N k correspond to the refinement rounds via j-neighbors used in oblivious k-WL.Definition 21.Let k ≥ 1 and W

[ 9 ,
Proposition 7.5].Lemma 40.Let k ≥ 1.For every n ∈ N ∪ {∞}, the set T k n is closed under multiplication, contains 1 M k n , and separates points of M k n .
sk W •x[•/j]µ-almost everywhere.Then, we have(T owl sk W • T S→ν sk W f )(x) = M sk dµ owl sk W (x) j,V dµ owl sk W (x) j,∅ • f d(owl sk W •x[•/j])* µ (Definition and Lemma 48 1) ν sk W ) = t(F, ((T sk W ) C sk W ) C ) = t(F, (T sk W ) C sk W /C) = t(F, T sk W ) = t([[F]], W )holds for every F ∈ F sk •,• .To prove this, one has to apply Lemma 19 twice this time: first, to get from T sk W to (T sk W ) C sk W and, second, to get from there to ((T sk W) C sk W ) C and (T sk W ) C sk W /C.For a term F ∈ F sk •,• and every n ∈ N with n ≥ h(F), the set F F n of functions M sk n → [0, 1] is defined similarly to Definition 37.More precisely, while we could just use the old definition, it can actually be simplified as the distinct cases for adjacency and neighbor graphs can be subsumed by the functionα → M sk n f d(α n+1 ) (j,V ) ∈ F S k j,V •F n+1 for every f ∈ F F n and every j ∈ [k].From there, we analogously obtain the set F F of continuous functions M sk → [0, 1].Lemma 38 and Corollary 39 adapt in a straight-forward fashion.For every n ∈ N ∪ {∞}, define T sk n := F∈ F sk •,•,h(F)≤n F F n and abbreviate T sk := T sk ∞ .Lemma 40 also adapts easily, i.e., for every n ∈ N ∪ {∞}, the set T sk n is closed under multiplication, contains 1 M sk n , and separates points of M sk n .

W• S * . 5 .Proof. 1 =⇒ 2 :
andS * • (T sk U ) C sk U = (T sk W ) C skW There are µ ⊗k -rel.comp.sub-σ-algebras C, D of B ⊗k that are simply U -invariant and simply W -invariant, respectively, and a Markov iso.R :L 2 (X k /D, µ ⊗k /D) → L 2 (X k /C, µ ⊗k /C) such that (T sk U ) C sk U /C • R = R • (T sk W )Follows from Lemma 50. 2 =⇒ 3: Analogous to Theorem 4 as we have both T sk U but this is already what we wanted to show.It remains to prove that By the inductive hypothesis and the Stone-Weierstrass theorem [6, Theorem 2.4.11], the linear hull of T k n is uniformly dense in C(M k n ).Since M k n is Hausdorff, it then follows from the Portmanteau theorem [15, Theorem 17.20] that there is an Note that this notation is justified as M ∞ is again a standard Borel space [15, Exercise 17.16].As a product of a sequence of metrizable compact spaces, k≥1 M k is metrizable [6, Proposition 2.4.4] and also compact by Tychonoff's Theorem [6, Theorem 2.2.8].Since p k+1,k is continuous, this implies that M ∞ is closed and, hence, a metrizable compact space.Let W ) for every multigraph F .While simple graph and multigraph homomorphism densities yield different topologies, two graphons are nevertheless weakly isomorphic if and only if they have the same multigraph homomorphism densities [16, Corollary 10.36].Since M ≤1 (M ∞ ) is Hausdorff, this yields the following corollary.Recall the system L k iso of linear equations from the introduction: two simple graphs G and H are not distinguished by oblivious k-WL if and only if L k iso (G, H) has a non-negative real solution.Let us take a closer look at L k iso (G, H) to see that it is much closer related to the characterization T k U • S = S •T k W from Theorem 4 than it might seem at first glance.The variables of L k iso (G, H), which are indexed by sets π ⊆ V (G)×V (H) of size |π| ≤ k, can be interpreted as permutation-invariant matrices on -invariant µ ⊗k -relatively complete sub-σ-algebra of B ⊗k .We now deviate a bit from the definition of W -invariance and call a C ∈ Θ(B ⊗k , µ ⊗k ) simply W -invariant if C is invariant for every operator in the family(T sk W ) C sk W , i.e., C is (T F →W ) C sk W invariant for every F ∈ F sk .The reason for this is that, since T sk W is not closed under taking adjoints, C sk W might not be invariant under these adjoints.In contrast, C sk W is trivially both (T sk W ) C sk W -invariant and (T sk W ) *