138 research outputs found
Riemannian trust-region methods for strict saddle functions with complexity guarantees
The difficulty of minimizing a nonconvex function is in part explained by the
presence of saddle points. This slows down optimization algorithms and impacts
worst-case complexity guarantees. However, many nonconvex problems of interest
possess a favorable structure for optimization, in the sense that saddle points
can be escaped efficiently by appropriate algorithms. This strict saddle
property has been extensively used in data science to derive good properties
for first-order algorithms, such as convergence to second-order critical
points. However, the analysis and the design of second-order algorithms in the
strict saddle setting have received significantly less attention. In this
paper, we consider second-order trust-region methods for a class of strict
saddle functions defined on Riemannian manifolds. These functions exhibit
(geodesic) strong convexity around minimizers and negative curvature at saddle
points. We show that the standard trust-region method with exact subproblem
minimization finds an approximate local minimizer in a number of iterations
that depends logarithmically on the accuracy parameter, which significantly
improves known results for general nonconvex optimization. We also propose an
inexact variant of the algorithm that explicitly leverages the strict saddle
property to compute the most appropriate step at every iteration. Our bounds
for the inexact variant also improve over the general nonconvex case, and
illustrate the benefit of using strict saddle properties within optimization
algorithms. Keywords: Riemannian optimization, strict saddle function,
second-order method, complexity guarantees
Direct search based on probabilistic descent in reduced spaces
Derivative-free algorithms seek the minimum value of a given objective
function without using any derivative information. The performance of these
methods often worsen as the dimension increases, a phenomenon predicted by
their worst-case complexity guarantees. Nevertheless, recent algorithmic
proposals have shown that incorporating randomization into otherwise
deterministic frameworks could alleviate this effect for direct-search methods.
The best guarantees and practical performance are obtained when employing a
random vector and its negative, which amounts to drawing directions in a random
one-dimensional subspace. Unlike for other derivative-free schemes, however,
the properties of these subspaces have not been exploited.
In this paper, we study a generic direct-search algorithm in which the
polling directions are defined using random subspaces. Complexity guarantees
for such an approach are derived thanks to probabilistic properties related to
both the subspaces and the directions used within these subspaces. By
leveraging results on random subspace embeddings and sketching matrices, we
show that better complexity bounds are obtained for randomized instances of our
framework. A numerical investigation confirms the benefit of randomization,
particularly when done in subspaces, when solving problems of moderately large
dimension
Complexity analysis of regularization methods for implicitly constrained least squares
Optimization problems constrained by partial differential equations (PDEs)
naturally arise in scientific computing, as those constraints often model
physical systems or the simulation thereof. In an implicitly constrained
approach, the constraints are incorporated into the objective through a reduced
formulation. To this end, a numerical procedure is typically applied to solve
the constraint system, and efficient numerical routines with quantifiable cost
have long been developed. Meanwhile, the field of complexity in optimization,
that estimates the cost of an optimization algorithm, has received significant
attention in the literature, with most of the focus being on unconstrained or
explicitly constrained problems.
In this paper, we analyze an algorithmic framework based on quadratic
regularization for implicitly constrained nonlinear least squares. By
leveraging adjoint formulations, we can quantify the worst-case cost of our
method to reach an approximate stationary point of the optimization problem.
Our definition of such points exploits the least-squares structure of the
objective, leading to an efficient implementation. Numerical experiments
conducted on PDE-constrained optimization problems demonstrate the efficiency
of the proposed framework.Comment: 21 pages, 2 figure
A Subsampling Line-Search Method with Second-Order Results
In many contemporary optimization problems such as those arising in machine
learning, it can be computationally challenging or even infeasible to evaluate
an entire function or its derivatives. This motivates the use of stochastic
algorithms that sample problem data, which can jeopardize the guarantees
obtained through classical globalization techniques in optimization such as a
trust region or a line search. Using subsampled function values is particularly
challenging for the latter strategy, which relies upon multiple evaluations. On
top of that all, there has been an increasing interest for nonconvex
formulations of data-related problems, such as training deep learning models.
For such instances, one aims at developing methods that converge to
second-order stationary points quickly, i.e., escape saddle points efficiently.
This is particularly delicate to ensure when one only accesses subsampled
approximations of the objective and its derivatives.
In this paper, we describe a stochastic algorithm based on negative curvature
and Newton-type directions that are computed for a subsampling model of the
objective. A line-search technique is used to enforce suitable decrease for
this model, and for a sufficiently large sample, a similar amount of reduction
holds for the true objective. By using probabilistic reasoning, we can then
obtain worst-case complexity guarantees for our framework, leading us to
discuss appropriate notions of stationarity in a subsampling context. Our
analysis encompasses the deterministic regime, and allows us to identify
sampling requirements for second-order line-search paradigms. As we illustrate
through real data experiments, these worst-case estimates need not be satisfied
for our method to be competitive with first-order strategies in practice
Optimal quantization of the mean measure and applications to statistical learning
This paper addresses the case where data come as point sets, or more
generally as discrete measures. Our motivation is twofold: first we intend to
approximate with a compactly supported measure the mean of the measure
generating process, that coincides with the intensity measure in the point
process framework, or with the expected persistence diagram in the framework of
persistence-based topological data analysis. To this aim we provide two
algorithms that we prove almost minimax optimal. Second we build from the
estimator of the mean measure a vectorization map, that sends every measure
into a finite-dimensional Euclidean space, and investigate its properties
through a clustering-oriented lens. In a nutshell, we show that in a mixture of
measure generating process, our technique yields a representation in
, for that guarantees a good clustering of
the data points with high probability. Interestingly, our results apply in the
framework of persistence-based shape classification via the ATOL procedure
described in \cite{Royer19}
Optimal quantization of the mean measure and application to clustering of measures
This paper addresses the case where data come as point sets, or more generally as discrete measures. Our motivation is twofold: first we intend to approximate with a compactly supported measure the mean of the measure generating process, that coincides with the intensity measure in the point process framework, or with the expected persistence diagram in the framework of persistence-based topological data analysis. To this aim we provide two algorithms that we prove almost minimax optimal. Second we build from the estimator of the mean measure a vectorization map, that sends every measure into a finite-dimensional Euclidean space, and investigate its properties through a clustering-oriented lens. In a nutshell, we show that in a mixture of measure generating process, our technique yields a representation in , for that guarantees a good clustering of the data points with high probability. Interestingly, our results apply in the framework of persistence-based shape classification via the ATOL procedure described in \cite{Royer19}
ATOL: Measure Vectorisation for Automatic Topologically-Oriented Learning
Robust topological information commonly comes in the form of a set of persistence diagrams, finite measures that are in nature uneasy to affix to generic machine learning frameworks. We introduce a learnt, unsupervised measure vectorisation method and use it for reflecting underlying changes in topological behaviour in machine learning contexts. Relying on optimal measure quantisation results the method is tailored to efficiently discriminate important plane regions where meaningful differences arise. We showcase the strength and robustness of our approach on a number of applications, from emulous and modern graph collections where the method reaches state-of-the-art performance to a geometric synthetic dynamical orbits problem. The proposed methodology comes with only high level tuning parameters such as the total measure encoding budget, and we provide a completely open access software
- âŠ