184 research outputs found
Beyond scalar quasi-arithmetic means: Quasi-arithmetic averages and quasi-arithmetic mixtures in information geometry
We generalize quasi-arithmetic means beyond scalars by considering the
gradient map of a Legendre type real-valued function. The gradient map of a
Legendre type function is proven strictly comonotone with a global inverse. It
thus yields a generalization of strictly mononotone and differentiable
functions generating scalar quasi-arithmetic means. Furthermore, the Legendre
transformation gives rise to pairs of dual quasi-arithmetic averages via the
convex duality. We study the invariance and equivariance properties under
affine transformations of quasi-arithmetic averages via the lens of dually flat
spaces of information geometry. We show how these quasi-arithmetic averages are
used to express points on dual geodesics and sided barycenters in the dual
affine coordinate systems. We then consider quasi-arithmetic mixtures and
describe several parametric and non-parametric statistical models which are
closed under the quasi-arithmetic mixture operation.Comment: 20 page
Adaptive Preconditioned Gradient Descent with Energy
We propose an adaptive time step with energy for a large class of
preconditioned gradient descent methods, mainly applied to constrained
optimization problems. Our strategy relies on representing the usual descent
direction by the product of an energy variable and a transformed gradient, with
a preconditioning matrix, for example, to reflect the natural gradient induced
by the underlying metric in parameter space or to endow a projection operator
when linear equality constraints are present. We present theoretical results on
both unconditional stability and convergence rates for three respective classes
of objective functions. In addition, our numerical results shed light on the
excellent performance of the proposed method on several benchmark optimization
problems.Comment: 32 pages, 3 figure
A numerical approximation method for the Fisher-Rao distance between multivariate normal distributions
We present a simple method to approximate Rao's distance between multivariate
normal distributions based on discretizing curves joining normal distributions
and approximating Rao's distances between successive nearby normal
distributions on the curves by the square root of Jeffreys divergence, the
symmetrized Kullback-Leibler divergence. We consider experimentally the linear
interpolation curves in the ordinary, natural and expectation parameterizations
of the normal distributions, and compare these curves with a curve derived from
the Calvo and Oller's isometric embedding of the Fisher-Rao -variate normal
manifold into the cone of symmetric positive-definite
matrices [Journal of multivariate analysis 35.2 (1990): 223-242]. We report on
our experiments and assess the quality of our approximation technique by
comparing the numerical approximations with both lower and upper bounds.
Finally, we present several information-geometric properties of the Calvo and
Oller's isometric embedding.Comment: 46 pages, 19 figures, 3 table
New Directions for Contact Integrators
Contact integrators are a family of geometric numerical schemes which
guarantee the conservation of the contact structure. In this work we review the
construction of both the variational and Hamiltonian versions of these methods.
We illustrate some of the advantages of geometric integration in the
dissipative setting by focusing on models inspired by recent studies in
celestial mechanics and cosmology.Comment: To appear as Chapter 24 in GSI 2021, Springer LNCS 1282
Parameterized Wasserstein Hamiltonian Flow
In this work, we propose a numerical method to compute the Wasserstein
Hamiltonian flow (WHF), which is a Hamiltonian system on the probability
density manifold. Many well-known PDE systems can be reformulated as WHFs. We
use parameterized function as push-forward map to characterize the solution of
WHF, and convert the PDE to a finite-dimensional ODE system, which is a
Hamiltonian system in the phase space of the parameter manifold. We establish
error analysis results for the continuous time approximation scheme in
Wasserstein metric. For the numerical implementation, we use neural networks as
push-forward maps. We apply an effective symplectic scheme to solve the derived
Hamiltonian ODE system so that the method preserves some important quantities
such as total energy. The computation is done by fully deterministic symplectic
integrator without any neural network training. Thus, our method does not
involve direct optimization over network parameters and hence can avoid the
error introduced by stochastic gradient descent (SGD) methods, which is usually
hard to quantify and measure. The proposed algorithm is a sampling-based
approach that scales well to higher dimensional problems. In addition, the
method also provides an alternative connection between the Lagrangian and
Eulerian perspectives of the original WHF through the parameterized ODE
dynamics.Comment: We welcome any comments and suggestion
On the convergence of orthogonalization-free conjugate gradient method for extreme eigenvalues of Hermitian matrices: a Riemannian optimization interpretation
In many applications, it is desired to obtain extreme eigenvalues and
eigenvectors of large Hermitian matrices by efficient and compact algorithms.
In particular, orthogonalization-free methods are preferred for large-scale
problems for finding eigenspaces of extreme eigenvalues without explicitly
computing orthogonal vectors in each iteration. For the top eigenvalues,
the simplest orthogonalization-free method is to find the best rank-
approximation to a positive semi-definite Hermitian matrix by algorithms
solving the unconstrained Burer-Monteiro formulation. We show that the
nonlinear conjugate gradient method for the unconstrained Burer-Monteiro
formulation is equivalent to a Riemannian conjugate gradient method on a
quotient manifold with the Bures-Wasserstein metric, thus its global
convergence to a stationary point can be proven. Numerical tests suggest that
it is efficient for computing the largest eigenvalues for large-scale
matrices if the largest eigenvalues are nearly distributed uniformly
- …