137 research outputs found
Optimization and Learning over Riemannian Manifolds
Learning over smooth nonlinear spaces has found wide applications. A principled approach for addressing such problems is to endow the search space with a Riemannian manifold geometry and numerical optimization can be performed intrinsically. Recent years have seen a surge of interest in leveraging Riemannian optimization for nonlinearly-constrained problems. This thesis investigates and improves on the existing algorithms for Riemannian optimization, with a focus on unified analysis frameworks and generic strategies. To this end, the first chapter systematically studies the choice of Riemannian geometries and their impacts on algorithmic convergence, on the manifold of positive definite matrices. The second chapter considers stochastic optimization on manifolds and proposes a unified framework for analyzing and improving the convergence of Riemannian variance reduction methods for nonconvex functions. The third chapter introduces a generic acceleration scheme based on the idea of extrapolation, which achieves optimal convergence rate asymptotically while being empirically efficient
From Continuous Dynamics to Graph Neural Networks: Neural Diffusion and Beyond
Graph neural networks (GNNs) have demonstrated significant promise in
modelling relational data and have been widely applied in various fields of
interest. The key mechanism behind GNNs is the so-called message passing where
information is being iteratively aggregated to central nodes from their
neighbourhood. Such a scheme has been found to be intrinsically linked to a
physical process known as heat diffusion, where the propagation of GNNs
naturally corresponds to the evolution of heat density. Analogizing the process
of message passing to the heat dynamics allows to fundamentally understand the
power and pitfalls of GNNs and consequently informs better model design.
Recently, there emerges a plethora of works that proposes GNNs inspired from
the continuous dynamics formulation, in an attempt to mitigate the known
limitations of GNNs, such as oversmoothing and oversquashing. In this survey,
we provide the first systematic and comprehensive review of studies that
leverage the continuous perspective of GNNs. To this end, we introduce
foundational ingredients for adapting continuous dynamics to GNNs, along with a
general framework for the design of graph neural dynamics. We then review and
categorize existing works based on their driven mechanisms and underlying
dynamics. We also summarize how the limitations of classic GNNs can be
addressed under the continuous framework. We conclude by identifying multiple
open research directions
Generalized Bures-Wasserstein Geometry for Positive Definite Matrices
This paper proposes a generalized Bures-Wasserstein (BW) Riemannian geometry
for the manifold of symmetric positive definite matrices. We explore the
generalization of the BW geometry in three different ways: 1) by generalizing
the Lyapunov operator in the metric, 2) by generalizing the orthogonal
Procrustes distance, and 3) by generalizing the Wasserstein distance between
the Gaussians. We show that they all lead to the same geometry. The proposed
generalization is parameterized by a symmetric positive definite matrix
such that when , we recover the BW
geometry. We derive expressions for the distance, geodesic,
exponential/logarithm maps, Levi-Civita connection, and sectional curvature
under the generalized BW geometry. We also present applications and experiments
that illustrate the efficacy of the proposed geometry
Rieoptax: Riemannian Optimization in JAX
We present Rieoptax, an open source Python library for Riemannian
optimization in JAX. We show that many differential geometric primitives, such
as Riemannian exponential and logarithm maps, are usually faster in Rieoptax
than existing frameworks in Python, both on CPU and GPU. We support various
range of basic and advanced stochastic optimization solvers like Riemannian
stochastic gradient, stochastic variance reduction, and adaptive gradient
methods. A distinguishing feature of the proposed toolbox is that we also
support differentially private optimization on Riemannian manifolds
Generalized energy and gradient flow via graph framelets
In this work, we provide a theoretical understanding of the framelet-based
graph neural networks through the perspective of energy gradient flow. By
viewing the framelet-based models as discretized gradient flows of some energy,
we show it can induce both low-frequency and high-frequency-dominated dynamics,
via the separate weight matrices for different frequency components. This
substantiates its good empirical performance on both homophilic and
heterophilic graphs. We then propose a generalized energy via framelet
decomposition and show its gradient flow leads to a novel graph neural network,
which includes many existing models as special cases. We then explain how the
proposed model generally leads to more flexible dynamics, thus potentially
enhancing the representation power of graph neural networks
Generalized Laplacian Regularized Framelet GCNs
This paper introduces a novel Framelet Graph approach based on p-Laplacian
GNN. The proposed two models, named p-Laplacian undecimated framelet graph
convolution (pL-UFG) and generalized p-Laplacian undecimated framelet graph
convolution (pL-fUFG) inherit the nature of p-Laplacian with the expressive
power of multi-resolution decomposition of graph signals. The empirical study
highlights the excellent performance of the pL-UFG and pL-fUFG in different
graph learning tasks including node classification and signal denoising
Exposition on over-squashing problem on GNNs: Current Methods, Benchmarks and Challenges
Graph-based message-passing neural networks (MPNNs) have achieved remarkable
success in both node and graph-level learning tasks. However, several
identified problems, including over-smoothing (OSM), limited expressive power,
and over-squashing (OSQ), still limit the performance of MPNNs. In particular,
OSQ serves as the latest identified problem, where MPNNs gradually lose their
learning accuracy when long-range dependencies between graph nodes are
required. In this work, we provide an exposition on the OSQ problem by
summarizing different formulations of OSQ from current literature, as well as
the three different categories of approaches for addressing the OSQ problem. In
addition, we also discuss the alignment between OSQ and expressive power and
the trade-off between OSQ and OSM. Furthermore, we summarize the empirical
methods leveraged from existing works to verify the efficiency of OSQ
mitigation approaches, with illustrations of their computational complexities.
Lastly, we list some open questions that are of interest for further
exploration of the OSQ problem along with potential directions from the best of
our knowledge
- …