12 research outputs found
Finite Model Theory and Proof Complexity Revisited: Distinguishing Graphs in Choiceless Polynomial Time and the Extended Polynomial Calculus
This paper extends prior work on the connections between logics from finite model theory and propositional/algebraic proof systems. We show that if all non-isomorphic graphs in a given graph class can be distinguished in the logic Choiceless Polynomial Time with counting (CPT), then they can also be distinguished in the bounded-degree extended polynomial calculus (EPC), and the refutations have roughly the same size as the resource consumption of the CPT-sentence. This allows to transfer lower bounds for EPC to CPT and thus constitutes a new potential approach towards better understanding the limits of CPT. A super-polynomial EPC lower bound for a Ptime-instance of the graph isomorphism problem would separate CPT from Ptime and thus solve a major open question in finite model theory. Further, using our result, we provide a model theoretic proof for the separation of bounded-degree polynomial calculus and bounded-degree extended polynomial calculus
Revisiting Tree Isomorphism: AHU Algorithm with Primes Numbers
The AHU algorithm has been the state of the art since the 1970s for
determining in linear time whether two unordered rooted trees are isomorphic or
not. However, it has been criticized (by Campbell and Radford) for the way it
is written, which requires several (re)readings to be understood, and does not
facilitate its analysis. In this paper, we propose an alternative version of
the AHU algorithm, which addresses this issue by being designed to be clearer
to understand and implement, with the same theoretical complexity and equally
fast in practice.. Whereas the key to the linearity of the original algorithm
lay on the careful sorting of lists of integers, we replace this step by the
multiplication of lists of prime numbers, and prove that this substitution
causes no loss in the final complexity of the new algorithm
Fine-grained Expressivity of Graph Neural Networks
Numerous recent works have analyzed the expressive power of message-passing
graph neural networks (MPNNs), primarily utilizing combinatorial techniques
such as the -dimensional Weisfeiler-Leman test (-WL) for the graph
isomorphism problem. However, the graph isomorphism objective is inherently
binary, not giving insights into the degree of similarity between two given
graphs. This work resolves this issue by considering continuous extensions of
both -WL and MPNNs to graphons. Concretely, we show that the continuous
variant of -WL delivers an accurate topological characterization of the
expressive power of MPNNs on graphons, revealing which graphs these networks
can distinguish and the level of difficulty in separating them. We identify the
finest topology where MPNNs separate points and prove a universal approximation
theorem. Consequently, we provide a theoretical framework for graph and graphon
similarity combining various topological variants of classical
characterizations of the -WL. In particular, we characterize the expressive
power of MPNNs in terms of the tree distance, which is a graph distance based
on the concepts of fractional isomorphisms, and substructure counts via tree
homomorphisms, showing that these concepts have the same expressive power as
the -WL and MPNNs on graphons. Empirically, we validate our theoretical
findings by showing that randomly initialized MPNNs, without training, exhibit
competitive performance compared to their trained counterparts. Moreover, we
evaluate different MPNN architectures based on their ability to preserve graph
distances, highlighting the significance of our continuous -WL test in
understanding MPNNs' expressivity
Gradual Weisfeiler-Leman: Slow and Steady Wins the Race
The classical Weisfeiler-Leman algorithm aka color refinement is fundamental
for graph learning and central for successful graph kernels and graph neural
networks. Originally developed for graph isomorphism testing, the algorithm
iteratively refines vertex colors. On many datasets, the stable coloring is
reached after a few iterations and the optimal number of iterations for machine
learning tasks is typically even lower. This suggests that the colors diverge
too fast, defining a similarity that is too coarse. We generalize the concept
of color refinement and propose a framework for gradual neighborhood
refinement, which allows a slower convergence to the stable coloring and thus
provides a more fine-grained refinement hierarchy and vertex similarity. We
assign new colors by clustering vertex neighborhoods, replacing the original
injective color assignment function. Our approach is used to derive new
variants of existing graph kernels and to approximate the graph edit distance
via optimal assignments regarding vertex similarity. We show that in both
tasks, our method outperforms the original color refinement with only moderate
increase in running time advancing the state of the art
Canonization for Bounded and Dihedral Color Classes in Choiceless Polynomial Time
In the quest for a logic capturing Ptime the next natural classes of structures to consider are those with bounded color class size. We present a canonization procedure for graphs with dihedral color classes of bounded size in the logic of Choiceless Polynomial Time (CPT), which then captures Ptime on this class of structures. This is the first result of this form for non-abelian color classes.
The first step proposes a normal form which comprises a "rigid assemblage". This roughly means that the local automorphism groups form 2-injective 3-factor subdirect products. Structures with color classes of bounded size can be reduced canonization preservingly to normal form in CPT.
In the second step, we show that for graphs in normal form with dihedral color classes of bounded size, the canonization problem can be solved in CPT. We also show the same statement for general ternary structures in normal form if the dihedral groups are defined over odd domains
Detection of Common Subtrees with Identical Label Distribution
Frequent pattern mining is a relevant method to analyse structured data, like
sequences, trees or graphs. It consists in identifying characteristic
substructures of a dataset. This paper deals with a new type of patterns for
tree data: common subtrees with identical label distribution. Their detection
is far from obvious since the underlying isomorphism problem is graph
isomorphism complete. An elaborated search algorithm is developed and analysed
from both theoretical and numerical perspectives. Based on this, the
enumeration of patterns is performed through a new lossless compression scheme
for trees, called DAG-RW, whose complexity is investigated as well. The method
shows very good properties, both in terms of computation times and analysis of
real datasets from the literature. Compared to other substructures like
topological subtrees and labelled subtrees for which the isomorphism problem is
linear, the patterns found provide a more parsimonious representation of the
data.Comment: 40 page
WL meet VC
Recently, many works studied the expressive power of graph neural networks
(GNNs) by linking it to the -dimensional Weisfeiler--Leman algorithm
(). Here, the is a well-studied
heuristic for the graph isomorphism problem, which iteratively colors or
partitions a graph's vertex set. While this connection has led to significant
advances in understanding and enhancing GNNs' expressive power, it does not
provide insights into their generalization performance, i.e., their ability to
make meaningful predictions beyond the training set. In this paper, we study
GNNs' generalization ability through the lens of Vapnik--Chervonenkis (VC)
dimension theory in two settings, focusing on graph-level predictions. First,
when no upper bound on the graphs' order is known, we show that the bitlength
of GNNs' weights tightly bounds their VC dimension. Further, we derive an upper
bound for GNNs' VC dimension using the number of colors produced by the
. Secondly, when an upper bound on the graphs' order is
known, we show a tight connection between the number of graphs distinguishable
by the and GNNs' VC dimension. Our empirical study
confirms the validity of our theoretical findings.Comment: arXiv admin note: text overlap with arXiv:2206.1116