Real-world graphs naturally exhibit hierarchical or cyclical structures that
are unfit for the typical Euclidean space. While there exist graph neural
networks that leverage hyperbolic or spherical spaces to learn representations
that embed such structures more accurately, these methods are confined under
the message-passing paradigm, making the models vulnerable against side-effects
such as oversmoothing and oversquashing. More recent work have proposed global
attention-based graph Transformers that can easily model long-range
interactions, but their extensions towards non-Euclidean geometry are yet
unexplored. To bridge this gap, we propose Fully Product-Stereographic
Transformer, a generalization of Transformers towards operating entirely on the
product of constant curvature spaces. When combined with tokenized graph
Transformers, our model can learn the curvature appropriate for the input graph
in an end-to-end fashion, without the need of additional tuning on different
curvature initializations. We also provide a kernelized approach to
non-Euclidean attention, which enables our model to run in time and memory cost
linear to the number of nodes and edges while respecting the underlying
geometry. Experiments on graph reconstruction and node classification
demonstrate the benefits of generalizing Transformers to the non-Euclidean
domain.Comment: 19 pages, 7 figure