12 research outputs found
A unifying framework for the analysis of projection-free first-order methods under a sufficient slope condition
The analysis of projection-free first order methods is often complicated by
the presence of different kinds of "good" and "bad" steps. In this article, we
propose a unifying framework for projection-free methods, aiming to simplify
the converge analysis by getting rid of such a distinction between steps. The
main tool employed in our framework is the Short Step Chain (SSC) procedure,
which skips gradient computations in consecutive short steps until proper
stopping conditions are satisfied. This technique allows us to give a unified
analysis and converge rates in the general smooth non convex setting, as well
as convergence rates under a Kurdyka-Lojasiewicz (KL) property, a setting that,
to our knowledge, has not been analyzed before for the projection-free methods
under study. In this context, we prove local convergence rates comparable to
those of projected gradient methods under the same conditions. Our analysis
relies on a sufficient slope condition, ensuring that the directions selected
by the methods have the steepest slope possible up to a constant among feasible
directions. This condition is satisfied, among others, by several Frank-Wolfe
(FW) variants on polytopes, and by some projection-free methods on convex sets
with smooth boundary.Comment: 36 pages, 4 figure
Avoiding bad steps in Frank Wolfe variants
The analysis of Frank Wolfe (FW) variants is often complicated by the
presence of different kinds of "good" and "bad" steps. In this article we aim
to simplify the convergence analysis of some of these variants by getting rid
of such a distinction between steps, and to improve existing rates by ensuring
a sizable decrease of the objective at each iteration. In order to do this, we
define the Short Step Chain (SSC) procedure, which skips gradient computations
in consecutive short steps until proper stopping conditions are satisfied. This
technique allows us to give a unified analysis and converge rates in the
general smooth non convex setting, as well as a linear convergence rate under a
Kurdyka-Lojasiewicz (KL) property. While this setting has been widely studied
for proximal gradient type methods, to our knowledge, it has not been analyzed
before for the Frank Wolfe variants under study. An angle condition, ensuring
that the directions selected by the methods have the steepest slope possible up
to a constant, is used to carry out our analysis. We prove that this condition
is satisfied on polytopes by the away step Frank-Wolfe (AFW), the pairwise
Frank-Wolfe (PFW), and the Frank-Wolfe method with in face directions (FDFW).Comment: See arXiv:2008.09781 for an extended version of the pape
Active set complexity of the Away-step Frank-Wolfe Algorithm
In this paper, we study active set identification results for the away-step
Frank-Wolfe algorithm in different settings. We first prove a local
identification property that we apply, in combination with a convergence
hypothesis, to get an active set identification result. We then prove, in the
nonconvex case, a novel convergence rate result and active set
identification for different stepsizes (under suitable assumptions on the set
of stationary points). By exploiting those results, we also give explicit
active set complexity bounds for both strongly convex and nonconvex objectives.
While we initially consider the probability simplex as feasible set, in the
appendix we show how to adapt some of our results to generic polytopes.Comment: 23 page
Inexact Direct-Search Methods for Bilevel Optimization Problems
In this work, we introduce new direct search schemes for the solution of
bilevel optimization (BO) problems. Our methods rely on a fixed accuracy black
box oracle for the lower-level problem, and deal both with smooth and
potentially nonsmooth true objectives. We thus analyze for the first time in
the literature direct search schemes in these settings, giving convergence
guarantees to approximate stationary points, as well as complexity bounds in
the smooth case. We also propose the first adaptation of mesh adaptive direct
search schemes for BO. Some preliminary numerical results on a standard set of
bilevel optimization problems show the effectiveness of our new approaches
Convergence analysis and active set complexity for some FW variants
*The FW method, first introduced in 1956 by Marguerite Frank and Philip
Wolfe, has recently been the subject of renewed interest thanks to its many applications in machine learning. In this thesis we prove convergence and active set identification properties for some popular variations of this method. While the classic FW method has a slow O(1/t)** convergence rate even for strongly convex objectives, i*
*t has recently been proved that some FW variants on polytopes have faster convergence rates assuming an Holderian error bound condition which generalizes strong convexity. In this thesis we prove that for one of these variants this acceleration of the convergence rate can be extended also to a class of non polyhedral sets, including strictly convex smooth convex sets whose boundary satisfies some positive curvature property. We also prove that under suitable assumptions some FW variants on polytopes identify the active set in finite time. This result extends an analogous well known result proved for projected gradient methods. To prove our result however we use a fundamentally different technique, relating the identification property to active set identification strategies with
Lagrange multipliers. Other minor results of the thesis include a proof of
finite time active set identification for the pairwise step FW variant, a
new proof for the projected gradient finite time active set identification
property with explicit estimates, and a generalization of some of the
convergence rate results to reflexive Banach spaces.
First and zeroth order optimization methods for data science
Recent data science applications using large datasets often need scalable optimization methods with low per iteration cost and low memory requirements. This has lead to a renewed interest in gradient descent methods, and on tailored variants for problems where gradient descent is unpractical due, e.g., to non smoothness or stochasticity of the optimization objective. Applications include deep neural network training, adversarial attacks in machine learning, sparse signal recovery, cluster detection in networks, etc.
In this thesis, we focus on the theoretical analysis of some of these methods, as well as in the formulation and numerical testing of new methods with better complexity guarantees than existing ones under suitable conditions. The problems we consider have a continuous but sometimes constrained and not necessarily differentiable objective. The main contributions concern both some variants of the classic Frank-Wolfe (FW) method and direct search schemes. In particular, we prove new support identification properties for FW variants, with an application to a cluster detection problem in networks, we introduce a technique to provably speed up the convergence of FW variants, and extend some direct search schemes to the stochastic non smooth setting, as well as to problems defined on Riemannian manifolds.Recent data science applications using large datasets often need scalable optimization methods with low per iteration cost and low memory requirements. This has lead to a renewed interest in gradient descent methods, and on tailored variants for problems where gradient descent is unpractical due, e.g., to non smoothness or stochasticity of the optimization objective. Applications include deep neural network training, adversarial attacks in machine learning, sparse signal recovery, cluster detection in networks, etc.
In this thesis, we focus on the theoretical analysis of some of these methods, as well as in the formulation and numerical testing of new methods with better complexity guarantees than existing ones under suitable conditions. The problems we consider have a continuous but sometimes constrained and not necessarily differentiable objective. The main contributions concern both some variants of the classic Frank-Wolfe (FW) method and direct search schemes. In particular, we prove new support identification properties for FW variants, with an application to a cluster detection problem in networks, we introduce a technique to provably speed up the convergence of FW variants, and extend some direct search schemes to the stochastic non smooth setting, as well as to problems defined on Riemannian manifolds
A unifying framework for the analysis of projection-free first-order methods under a sufficient slope condition
The analysis of projection-free first order methods is often complicated by
the presence of different kinds of "good" and "bad" steps. In this article, we
propose a unifying framework for projection-free methods, aiming to simplify
the converge analysis by getting rid of such a distinction between steps. The
main tool employed in our framework is the Short Step Chain (SSC) procedure,
which skips gradient computations in consecutive short steps until proper
stopping conditions are satisfied. This technique allows us to give a unified
analysis and converge rates in the general smooth non convex setting, as well
as convergence rates under a Kurdyka-Lojasiewicz (KL) property, a setting that,
to our knowledge, has not been analyzed before for the projection-free methods
under study. In this context, we prove local convergence rates comparable to
those of projected gradient methods under the same conditions. Our analysis
relies on a sufficient slope condition, ensuring that the directions selected
by the methods have the steepest slope possible up to a constant among feasible
directions. This condition is satisfied, among others, by several Frank-Wolfe
(FW) variants on polytopes, and by some projection-free methods on convex sets
with smooth boundary
A weak tail-bound probabilistic condition for function estimation in stochastic derivative-free optimization
In this paper, we use tail bounds to define a tailored probabilistic
condition for function estimation that eases the theoretical analysis of
stochastic derivative-free optimization methods. In particular, we focus on the
unconstrained minimization of a potentially non-smooth function, whose values
can only be estimated via stochastic observations, and give a simplified
convergence proof for both a direct search and a basic trust-region scheme
Fast Cluster Detection in Networks by First Order Optimization
Cluster detection plays a fundamental role in the analysis of data. In this paper, we focus on the use of s-defective clique models for network-based cluster detection and propose a nonlinear optimization approach that efficiently handles those models in practice. In particular, we introduce an equivalent continuous formulation for the problem under analysis, and we analyze some tailored variants of the Frank-Wolfe algorithm that enable us to quickly find maximal s-defective cliques. The good practical behavior of those algorithmic tools, which is closely connected to their support identification properties, makes them very appealing in practical applications. The reported numerical results clearly show the effectiveness of the proposed approach