89 research outputs found
A Coordinate Descent Primal-Dual Algorithm and Application to Distributed Asynchronous Optimization
Based on the idea of randomized coordinate descent of -averaged
operators, a randomized primal-dual optimization algorithm is introduced, where
a random subset of coordinates is updated at each iteration. The algorithm
builds upon a variant of a recent (deterministic) algorithm proposed by V\~u
and Condat that includes the well known ADMM as a particular case. The obtained
algorithm is used to solve asynchronously a distributed optimization problem. A
network of agents, each having a separate cost function containing a
differentiable term, seek to find a consensus on the minimum of the aggregate
objective. The method yields an algorithm where at each iteration, a random
subset of agents wake up, update their local estimates, exchange some data with
their neighbors, and go idle. Numerical results demonstrate the attractive
performance of the method. The general approach can be naturally adapted to
other situations where coordinate descent convex optimization algorithms are
used with a random choice of the coordinates.Comment: 10 page
Nonsmoothness in Machine Learning: specific structure, proximal identification, and applications
Nonsmoothness is often a curse for optimization; but it is sometimes a
blessing, in particular for applications in machine learning. In this paper, we
present the specific structure of nonsmooth optimization problems appearing in
machine learning and illustrate how to leverage this structure in practice, for
compression, acceleration, or dimension reduction. We pay a special attention
to the presentation to make it concise and easily accessible, with both simple
examples and general results
A Distributed Flexible Delay-tolerant Proximal Gradient Algorithm
We develop and analyze an asynchronous algorithm for distributed convex
optimization when the objective writes a sum of smooth functions, local to each
worker, and a non-smooth function. Unlike many existing methods, our
distributed algorithm is adjustable to various levels of communication cost,
delays, machines computational power, and functions smoothness. A unique
feature is that the stepsizes do not depend on communication delays nor number
of machines, which is highly desirable for scalability. We prove that the
algorithm converges linearly in the strongly convex case, and provide
guarantees of convergence for the non-strongly convex case. The obtained rates
are the same as the vanilla proximal gradient algorithm over some introduced
epoch sequence that subsumes the delays of the system. We provide numerical
results on large-scale machine learning problems to demonstrate the merits of
the proposed method.Comment: to appear in SIAM Journal on Optimizatio
Proximal Gradient methods with Adaptive Subspace Sampling
Many applications in machine learning or signal processing involve nonsmooth
optimization problems. This nonsmoothness brings a low-dimensional structure to
the optimal solutions. In this paper, we propose a randomized proximal gradient
method harnessing this underlying structure. We introduce two key components:
i) a random subspace proximal gradient algorithm; ii) an identification-based
sampling of the subspaces. Their interplay brings a significant performance
improvement on typical learning problems in terms of dimensions explored
Newton acceleration on manifolds identified by proximal-gradient methods
Proximal methods are known to identify the underlying substructure of
nonsmooth optimization problems. Even more, in many interesting situations, the
output of a proximity operator comes with its structure at no additional cost,
and convergence is improved once it matches the structure of a minimizer.
However, it is impossible in general to know whether the current structure is
final or not; such highly valuable information has to be exploited adaptively.
To do so, we place ourselves in the case where a proximal gradient method can
identify manifolds of differentiability of the nonsmooth objective. Leveraging
this manifold identification, we show that Riemannian Newton-like methods can
be intertwined with the proximal gradient steps to drastically boost the
convergence. We prove the superlinear convergence of the algorithm when solving
some nondegenerated nonsmooth nonconvex optimization problems. We provide
numerical illustrations on optimization problems regularized by -norm
or trace-norm
On the Proximal Gradient Algorithm with Alternated Inertia
International audienceIn this paper, we investigate the attractive properties of the proximal gradient algorithm with inertia. Notably, we show that using alternated inertia yields monotonically decreasing functional values, which contrasts with usual accelerated proximal gradient methods. We also provide convergence rates for the algorithm with alternated inertia based on local geometric properties of the objective function. The results are put into perspective by discussions on several extensions and illustrations on common regularized problems
- …