30 research outputs found
Randomized Riemannian Preconditioning for Orthogonality Constrained Problems
Optimization problems with (generalized) orthogonality constraints are
prevalent across science and engineering. For example, in computational science
they arise in the symmetric (generalized) eigenvalue problem, in nonlinear
eigenvalue problems, and in electronic structures computations, to name a few
problems. In statistics and machine learning, they arise, for example, in
canonical correlation analysis and in linear discriminant analysis. In this
article, we consider using randomized preconditioning in the context of
optimization problems with generalized orthogonality constraints. Our proposed
algorithms are based on Riemannian optimization on the generalized Stiefel
manifold equipped with a non-standard preconditioned geometry, which
necessitates development of the geometric components necessary for developing
algorithms based on this approach. Furthermore, we perform asymptotic
convergence analysis of the preconditioned algorithms which help to
characterize the quality of a given preconditioner using second-order
information. Finally, for the problems of canonical correlation analysis and
linear discriminant analysis, we develop randomized preconditioners along with
corresponding bounds on the relevant condition number
Faster First-Order Methods for Stochastic Non-Convex Optimization on Riemannian Manifolds
SPIDER (Stochastic Path Integrated Differential EstimatoR) is an efficient
gradient estimation technique developed for non-convex stochastic optimization.
Although having been shown to attain nearly optimal computational complexity
bounds, the SPIDER-type methods are limited to linear metric spaces. In this
paper, we introduce the Riemannian SPIDER (R-SPIDER) method as a novel
nonlinear-metric extension of SPIDER for efficient non-convex optimization on
Riemannian manifolds. We prove that for finite-sum problems with
components, R-SPIDER converges to an -accuracy stationary point
within
stochastic gradient evaluations, which is sharper in magnitude than the prior
Riemannian first-order methods. For online optimization, R-SPIDER is shown to
converge with complexity which is,
to the best of our knowledge, the first non-asymptotic result for online
Riemannian optimization. Especially, for gradient dominated functions, we
further develop a variant of R-SPIDER and prove its linear convergence rate.
Numerical results demonstrate the computational efficiency of the proposed
methods
Guaranteed Nonconvex Factorization Approach for Tensor Train Recovery
In this paper, we provide the first convergence guarantee for the
factorization approach. Specifically, to avoid the scaling ambiguity and to
facilitate theoretical analysis, we optimize over the so-called left-orthogonal
TT format which enforces orthonormality among most of the factors. To ensure
the orthonormal structure, we utilize the Riemannian gradient descent (RGD) for
optimizing those factors over the Stiefel manifold. We first delve into the TT
factorization problem and establish the local linear convergence of RGD.
Notably, the rate of convergence only experiences a linear decline as the
tensor order increases. We then study the sensing problem that aims to recover
a TT format tensor from linear measurements. Assuming the sensing operator
satisfies the restricted isometry property (RIP), we show that with a proper
initialization, which could be obtained through spectral initialization, RGD
also converges to the ground-truth tensor at a linear rate. Furthermore, we
expand our analysis to encompass scenarios involving Gaussian noise in the
measurements. We prove that RGD can reliably recover the ground truth at a
linear rate, with the recovery error exhibiting only polynomial growth in
relation to the tensor order. We conduct various experiments to validate our
theoretical findings
Faster Randomized Methods for Orthogonality Constrained Problems
Recent literature has advocated the use of randomized methods for
accelerating the solution of various matrix problems arising throughout data
science and computational science. One popular strategy for leveraging
randomization is to use it as a way to reduce problem size. However, methods
based on this strategy lack sufficient accuracy for some applications.
Randomized preconditioning is another approach for leveraging randomization,
which provides higher accuracy. The main challenge in using randomized
preconditioning is the need for an underlying iterative method, thus randomized
preconditioning so far have been applied almost exclusively to solving
regression problems and linear systems. In this article, we show how to expand
the application of randomized preconditioning to another important set of
problems prevalent across data science: optimization problems with
(generalized) orthogonality constraints. We demonstrate our approach, which is
based on the framework of Riemannian optimization and Riemannian
preconditioning, on the problem of computing the dominant canonical
correlations and on the Fisher linear discriminant analysis problem. For both
problems, we evaluate the effect of preconditioning on the computational costs
and asymptotic convergence, and demonstrate empirically the utility of our
approach
Online Tensor Learning: Computational and Statistical Trade-offs, Adaptivity and Optimal Regret
We investigate a generalized framework for estimating latent low-rank tensors
in an online setting, encompassing both linear and generalized linear models.
This framework offers a flexible approach for handling continuous or
categorical variables. Additionally, we investigate two specific applications:
online tensor completion and online binary tensor learning. To address these
challenges, we propose the online Riemannian gradient descent algorithm, which
demonstrates linear convergence and the ability to recover the low-rank
component under appropriate conditions in all applications. Furthermore, we
establish a precise entry-wise error bound for online tensor completion.
Notably, our work represents the first attempt to incorporate noise in the
online low-rank tensor recovery task. Intriguingly, we observe a surprising
trade-off between computational and statistical aspects in the presence of
noise. Increasing the step size accelerates convergence but leads to higher
statistical error, whereas a smaller step size yields a statistically optimal
estimator at the expense of slower convergence. Moreover, we conduct regret
analysis for online tensor regression. Under the fixed step size regime, a
fascinating trilemma concerning the convergence rate, statistical error rate,
and regret is observed. With an optimal choice of step size we achieve an
optimal regret of . Furthermore, we extend our analysis to the
adaptive setting where the horizon T is unknown. In this case, we demonstrate
that by employing different step sizes, we can attain a statistically optimal
error rate along with a regret of . To validate our theoretical
claims, we provide numerical results that corroborate our findings and support
our assertions
Optimization and Learning over Riemannian Manifolds
Learning over smooth nonlinear spaces has found wide applications. A principled approach for addressing such problems is to endow the search space with a Riemannian manifold geometry and numerical optimization can be performed intrinsically. Recent years have seen a surge of interest in leveraging Riemannian optimization for nonlinearly-constrained problems. This thesis investigates and improves on the existing algorithms for Riemannian optimization, with a focus on unified analysis frameworks and generic strategies. To this end, the first chapter systematically studies the choice of Riemannian geometries and their impacts on algorithmic convergence, on the manifold of positive definite matrices. The second chapter considers stochastic optimization on manifolds and proposes a unified framework for analyzing and improving the convergence of Riemannian variance reduction methods for nonconvex functions. The third chapter introduces a generic acceleration scheme based on the idea of extrapolation, which achieves optimal convergence rate asymptotically while being empirically efficient
Riemannian Stochastic Gradient Method for Nested Composition Optimization
This work considers optimization of composition of functions in a nested form
over Riemannian manifolds where each function contains an expectation. This
type of problems is gaining popularity in applications such as policy
evaluation in reinforcement learning or model customization in meta-learning.
The standard Riemannian stochastic gradient methods for non-compositional
optimization cannot be directly applied as stochastic approximation of inner
functions create bias in the gradients of the outer functions. For two-level
composition optimization, we present a Riemannian Stochastic Composition
Gradient Descent (R-SCGD) method that finds an approximate stationary point,
with expected squared Riemannian gradient smaller than , in
calls to the stochastic gradient oracle of the outer
function and stochastic function and gradient oracles of the inner function.
Furthermore, we generalize the R-SCGD algorithms for problems with multi-level
nested compositional structures, with the same complexity of
for the first-order stochastic oracle. Finally, the performance of the R-SCGD
method is numerically evaluated over a policy evaluation problem in
reinforcement learning