518 research outputs found
Alternating direction method of multipliers with variable step sizes
The alternating direction method of multipliers (ADMM) is a flexible method
to solve a large class of convex minimization problems. Particular features are
its unconditional convergence with respect to the involved step size and its
direct applicability. This article deals with the ADMM with variable step sizes
and devises an adjustment rule for the step size relying on the monotonicity of
the residual and discusses proper stopping criteria. The numerical experiments
show significant improvements over established variants of the ADMM
Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming
Motivated by big data applications, first-order methods have been extremely
popular in recent years. However, naive gradient methods generally converge
slowly. Hence, much efforts have been made to accelerate various first-order
methods. This paper proposes two accelerated methods towards solving structured
linearly constrained convex programming, for which we assume composite convex
objective.
The first method is the accelerated linearized augmented Lagrangian method
(LALM). At each update to the primal variable, it allows linearization to the
differentiable function and also the augmented term, and thus it enables easy
subproblems. Assuming merely weak convexity, we show that LALM owns
convergence if parameters are kept fixed during all the iterations and can be
accelerated to if the parameters are adapted, where is the
number of total iterations.
The second method is the accelerated linearized alternating direction method
of multipliers (LADMM). In addition to the composite convexity, it further
assumes two-block structure on the objective. Different from classic ADMM, our
method allows linearization to the objective and also augmented term to make
the update simple. Assuming strong convexity on one block variable, we show
that LADMM also enjoys convergence with adaptive parameters. This
result is a significant improvement over that in [Goldstein et. al, SIIMS'14],
which requires strong convexity on both block variables and no linearization to
the objective or augmented term.
Numerical experiments are performed on quadratic programming, image
denoising, and support vector machine. The proposed accelerated methods are
compared to nonaccelerated ones and also existing accelerated methods. The
results demonstrate the validness of acceleration and superior performance of
the proposed methods over existing ones
Adaptive Restart of the Optimized Gradient Method for Convex Optimization
First-order methods with momentum such as Nesterov's fast gradient method are
very useful for convex optimization problems, but can exhibit undesirable
oscillations yielding slow convergence rates for some applications. An adaptive
restarting scheme can improve the convergence rate of the fast gradient method,
when the parameter of a strongly convex cost function is unknown or when the
iterates of the algorithm enter a locally strongly convex region. Recently, we
introduced the optimized gradient method, a first-order algorithm that has an
inexpensive per-iteration computational cost similar to that of the fast
gradient method, yet has a worst-case cost function rate that is twice faster
than that of the fast gradient method and that is optimal for large-dimensional
smooth convex problems. Building upon the success of accelerating the fast
gradient method using adaptive restart, this paper investigates similar
heuristic acceleration of the optimized gradient method. We first derive a new
first-order method that resembles the optimized gradient method for strongly
convex quadratic problems with known function parameters, yielding a linear
convergence rate that is faster than that of the analogous version of the fast
gradient method. We then provide a heuristic analysis and numerical experiments
that illustrate that adaptive restart can accelerate the convergence of the
optimized gradient method. Numerical results also illustrate that adaptive
restart is helpful for a proximal version of the optimized gradient method for
nonsmooth composite convex functions
Damped Anderson acceleration with restarts and monotonicity control for accelerating EM and EM-like algorithms
The expectation-maximization (EM) algorithm is a well-known iterative method
for computing maximum likelihood estimates from incomplete data. Despite its
numerous advantages, a main drawback of the EM algorithm is its frequently
observed slow convergence which often hinders the application of EM algorithms
in high-dimensional problems or in other complex settings.To address the need
for more rapidly convergent EM algorithms, we describe a new class of
acceleration schemes that build on the Anderson acceleration technique for
speeding fixed-point iterations. Our approach is effective at greatly
accelerating the convergence of EM algorithms and is automatically scalable to
high dimensional settings. Through the introduction of periodic algorithm
restarts and a damping factor, our acceleration scheme provides faster and more
robust convergence when compared to un-modified Anderson acceleration while
also improving global convergence. Crucially, our method works as an
"off-the-shelf" method in that it may be directly used to accelerate any EM
algorithm without relying on the use of any model-specific features or
insights. Through a series of simulation studies involving five representative
problems, we show that our algorithm is substantially faster than the existing
state-of-art acceleration schemes
On Quasi-Newton Forward--Backward Splitting: Proximal Calculus and Convergence
We introduce a framework for quasi-Newton forward--backward splitting
algorithms (proximal quasi-Newton methods) with a metric induced by diagonal
rank- symmetric positive definite matrices. This special type of
metric allows for a highly efficient evaluation of the proximal mapping. The
key to this efficiency is a general proximal calculus in the new metric. By
using duality, formulas are derived that relate the proximal mapping in a
rank- modified metric to the original metric. We also describe efficient
implementations of the proximity calculation for a large class of functions;
the implementations exploit the piece-wise linear nature of the dual problem.
Then, we apply these results to acceleration of composite convex minimization
problems, which leads to elegant quasi-Newton methods for which we prove
convergence. The algorithm is tested on several numerical examples and compared
to a comprehensive list of alternatives in the literature. Our quasi-Newton
splitting algorithm with the prescribed metric compares favorably against
state-of-the-art. The algorithm has extensive applications including signal
processing, sparse recovery, machine learning and classification to name a few.Comment: arXiv admin note: text overlap with arXiv:1206.115
Accelerated Proximal Point Method for Maximally Monotone Operators
This paper proposes an accelerated proximal point method for maximally
monotone operators. The proof is computer-assisted via the performance
estimation problem approach. The proximal point method includes various
well-known convex optimization methods, such as the proximal method of
multipliers and the alternating direction method of multipliers, and thus the
proposed acceleration has wide applications. Numerical experiments are
presented to demonstrate the accelerating behaviors
Linear convergence of first order methods for non-strongly convex optimization
The standard assumption for proving linear convergence of first order methods
for smooth convex optimization is the strong convexity of the objective
function, an assumption which does not hold for many practical applications. In
this paper, we derive linear convergence rates of several first order methods
for solving smooth non-strongly convex constrained optimization problems, i.e.
involving an objective function with a Lipschitz continuous gradient that
satisfies some relaxed strong convexity condition. In particular, in the case
of smooth constrained convex optimization, we provide several relaxations of
the strong convexity conditions and prove that they are sufficient for getting
linear convergence for several first order methods such as projected gradient,
fast gradient and feasible descent methods. We also provide examples of
functional classes that satisfy our proposed relaxations of strong convexity
conditions. Finally, we show that the proposed relaxed strong convexity
conditions cover important applications ranging from solving linear systems,
Linear Programming, and dual formulations of linearly constrained convex
problems.Comment: 36 pages, 4 figure
Restarting Frank-Wolfe: Faster Rates Under H\"olderian Error Bounds
Conditional Gradients (aka Frank-Wolfe algorithms) form a classical set of
methods for constrained smooth convex minimization due to their simplicity, the
absence of projection steps, and competitive numerical performance. While the
vanilla Frank-Wolfe algorithm only ensures a worst-case rate of
, various recent results have shown that for strongly
convex functions on polytopes, the method can be slightly modified to achieve
linear convergence. However, this still leaves a huge gap between sublinear
convergence and linear
convergence to reach an -approximate solution. Here, we present a new
variant of Conditional Gradients, that can dynamically adapt to the function's
geometric properties using restarts and smoothly interpolates between the
sublinear and linear regimes. Similarly, the vanilla Frank-Wolfe method has a
convergence rate when both constraint set and
objective function are strongly convex. We show that relaxing the strong
convexity assumption interpolates between various known rates and that similar
interpolated convergence rates are obtained when strong convexity of the
constraint set is relaxed to uniform convexity.Comment: Journal versio
Convolutional Dictionary Learning: Acceleration and Convergence
Convolutional dictionary learning (CDL or sparsifying CDL) has many
applications in image processing and computer vision. There has been growing
interest in developing efficient algorithms for CDL, mostly relying on the
augmented Lagrangian (AL) method or the variant alternating direction method of
multipliers (ADMM). When their parameters are properly tuned, AL methods have
shown fast convergence in CDL. However, the parameter tuning process is not
trivial due to its data dependence and, in practice, the convergence of AL
methods depends on the AL parameters for nonconvex CDL problems. To moderate
these problems, this paper proposes a new practically feasible and convergent
Block Proximal Gradient method using a Majorizer (BPG-M) for CDL. The
BPG-M-based CDL is investigated with different block updating schemes and
majorization matrix designs, and further accelerated by incorporating some
momentum coefficient formulas and restarting techniques. All of the methods
investigated incorporate a boundary artifacts removal (or, more generally,
sampling) operator in the learning model. Numerical experiments show that,
without needing any parameter tuning process, the proposed BPG-M approach
converges more stably to desirable solutions of lower objective values than the
existing state-of-the-art ADMM algorithm and its memory-efficient variant do.
Compared to the ADMM approaches, the BPG-M method using a multi-block updating
scheme is particularly useful in single-threaded CDL algorithm handling large
datasets, due to its lower memory requirement and no polynomial computational
complexity. Image denoising experiments show that, for relatively strong
additive white Gaussian noise, the filters learned by BPG-M-based CDL
outperform those trained by the ADMM approach.Comment: 21 pages, 7 figures, submitted to IEEE Transactions on Image
Processin
Convex-SetâConstrained Sparse Signal Recovery: Theory and Applications
Convex-set constrained sparse signal reconstruction facilitates flexible measurement model and accurate recovery. The objective function that we wish to minimize is a sum of a convex differentiable data-fidelity (negative log-likelihood (NLL)) term and a convex regularization term. We apply sparse signal regularization where the signal belongs to a closed convex set within the closure of the domain of the NLL. Signal sparsity is imposed using the l1-norm penalty on the signal\u27s linear transform coefficients.
First, we present a projected Nesterovâs proximal-gradient (PNPG) approach that employs a projected Nesterov\u27s acceleration step with restart and a duality-based inner iteration to compute the proximal mapping. We propose an adaptive step-size selection scheme to obtain a good local majorizing function of the NLL and reduce the time spent backtracking. We present an integrated derivation of the momentum acceleration and proofs of O(k^(-2)) objective function convergence rate and convergence of the iterates, which account for adaptive step size, inexactness of the iterative proximal mapping, and the convex-set constraint. The tuning of PNPG is largely application independent. Tomographic and compressed-sensing reconstruction experiments with Poisson generalized linear and Gaussian linear measurement models demonstrate the performance of the proposed approach.
We then address the problem of upper-bounding the regularization constant for the convex-set--constrained sparse signal recovery problem behind the PNPG framework. This bound defines the maximum influence the regularization term has to the signal recovery. We formulate an optimization problem for finding these bounds when the regularization term can be globally minimized and develop an alternating direction method of multipliers (ADMM) type method for their computation. Simulation examples show that the derived and empirical bounds match.
Finally, we show application of the PNPG framework to X-ray computed tomography (CT) and outline a method for sparse image reconstruction from Poisson-distributed polychromatic X-ray CT measurements under the blind scenario where the material of the inspected object and the incident energy spectrum are unknown. To obtain a parsimonious mean measurement-model parameterization, we first rewrite the measurement equation by changing the integral variable from photon energy to mass attenuation, which allows us to combine the variations brought by the unknown incident spectrum and mass attenuation into a single unknown mass-attenuation spectrum function; the resulting measurement equation has the Laplace integral form. We apply a block coordinate-descent algorithm that alternates between an NPG image reconstruction step and a limited-memory BFGS with box constraints (L-BFGS-B) iteration for updating mass-attenuation spectrum parameters. Our NPG-BFGS algorithm is the first physical-model based image reconstruction method for simultaneous blind sparse image reconstruction and mass-attenuation spectrum estimation from polychromatic measurements. Real X-ray CT reconstruction examples demonstrate the performance of the proposed blind scheme
- âŠ