518 research outputs found

    Alternating direction method of multipliers with variable step sizes

    Full text link
    The alternating direction method of multipliers (ADMM) is a flexible method to solve a large class of convex minimization problems. Particular features are its unconditional convergence with respect to the involved step size and its direct applicability. This article deals with the ADMM with variable step sizes and devises an adjustment rule for the step size relying on the monotonicity of the residual and discusses proper stopping criteria. The numerical experiments show significant improvements over established variants of the ADMM

    Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming

    Full text link
    Motivated by big data applications, first-order methods have been extremely popular in recent years. However, naive gradient methods generally converge slowly. Hence, much efforts have been made to accelerate various first-order methods. This paper proposes two accelerated methods towards solving structured linearly constrained convex programming, for which we assume composite convex objective. The first method is the accelerated linearized augmented Lagrangian method (LALM). At each update to the primal variable, it allows linearization to the differentiable function and also the augmented term, and thus it enables easy subproblems. Assuming merely weak convexity, we show that LALM owns O(1/t)O(1/t) convergence if parameters are kept fixed during all the iterations and can be accelerated to O(1/t2)O(1/t^2) if the parameters are adapted, where tt is the number of total iterations. The second method is the accelerated linearized alternating direction method of multipliers (LADMM). In addition to the composite convexity, it further assumes two-block structure on the objective. Different from classic ADMM, our method allows linearization to the objective and also augmented term to make the update simple. Assuming strong convexity on one block variable, we show that LADMM also enjoys O(1/t2)O(1/t^2) convergence with adaptive parameters. This result is a significant improvement over that in [Goldstein et. al, SIIMS'14], which requires strong convexity on both block variables and no linearization to the objective or augmented term. Numerical experiments are performed on quadratic programming, image denoising, and support vector machine. The proposed accelerated methods are compared to nonaccelerated ones and also existing accelerated methods. The results demonstrate the validness of acceleration and superior performance of the proposed methods over existing ones

    Adaptive Restart of the Optimized Gradient Method for Convex Optimization

    Full text link
    First-order methods with momentum such as Nesterov's fast gradient method are very useful for convex optimization problems, but can exhibit undesirable oscillations yielding slow convergence rates for some applications. An adaptive restarting scheme can improve the convergence rate of the fast gradient method, when the parameter of a strongly convex cost function is unknown or when the iterates of the algorithm enter a locally strongly convex region. Recently, we introduced the optimized gradient method, a first-order algorithm that has an inexpensive per-iteration computational cost similar to that of the fast gradient method, yet has a worst-case cost function rate that is twice faster than that of the fast gradient method and that is optimal for large-dimensional smooth convex problems. Building upon the success of accelerating the fast gradient method using adaptive restart, this paper investigates similar heuristic acceleration of the optimized gradient method. We first derive a new first-order method that resembles the optimized gradient method for strongly convex quadratic problems with known function parameters, yielding a linear convergence rate that is faster than that of the analogous version of the fast gradient method. We then provide a heuristic analysis and numerical experiments that illustrate that adaptive restart can accelerate the convergence of the optimized gradient method. Numerical results also illustrate that adaptive restart is helpful for a proximal version of the optimized gradient method for nonsmooth composite convex functions

    Damped Anderson acceleration with restarts and monotonicity control for accelerating EM and EM-like algorithms

    Full text link
    The expectation-maximization (EM) algorithm is a well-known iterative method for computing maximum likelihood estimates from incomplete data. Despite its numerous advantages, a main drawback of the EM algorithm is its frequently observed slow convergence which often hinders the application of EM algorithms in high-dimensional problems or in other complex settings.To address the need for more rapidly convergent EM algorithms, we describe a new class of acceleration schemes that build on the Anderson acceleration technique for speeding fixed-point iterations. Our approach is effective at greatly accelerating the convergence of EM algorithms and is automatically scalable to high dimensional settings. Through the introduction of periodic algorithm restarts and a damping factor, our acceleration scheme provides faster and more robust convergence when compared to un-modified Anderson acceleration while also improving global convergence. Crucially, our method works as an "off-the-shelf" method in that it may be directly used to accelerate any EM algorithm without relying on the use of any model-specific features or insights. Through a series of simulation studies involving five representative problems, we show that our algorithm is substantially faster than the existing state-of-art acceleration schemes

    On Quasi-Newton Forward--Backward Splitting: Proximal Calculus and Convergence

    Get PDF
    We introduce a framework for quasi-Newton forward--backward splitting algorithms (proximal quasi-Newton methods) with a metric induced by diagonal ±\pm rank-rr symmetric positive definite matrices. This special type of metric allows for a highly efficient evaluation of the proximal mapping. The key to this efficiency is a general proximal calculus in the new metric. By using duality, formulas are derived that relate the proximal mapping in a rank-rr modified metric to the original metric. We also describe efficient implementations of the proximity calculation for a large class of functions; the implementations exploit the piece-wise linear nature of the dual problem. Then, we apply these results to acceleration of composite convex minimization problems, which leads to elegant quasi-Newton methods for which we prove convergence. The algorithm is tested on several numerical examples and compared to a comprehensive list of alternatives in the literature. Our quasi-Newton splitting algorithm with the prescribed metric compares favorably against state-of-the-art. The algorithm has extensive applications including signal processing, sparse recovery, machine learning and classification to name a few.Comment: arXiv admin note: text overlap with arXiv:1206.115

    Accelerated Proximal Point Method for Maximally Monotone Operators

    Full text link
    This paper proposes an accelerated proximal point method for maximally monotone operators. The proof is computer-assisted via the performance estimation problem approach. The proximal point method includes various well-known convex optimization methods, such as the proximal method of multipliers and the alternating direction method of multipliers, and thus the proposed acceleration has wide applications. Numerical experiments are presented to demonstrate the accelerating behaviors

    Linear convergence of first order methods for non-strongly convex optimization

    Full text link
    The standard assumption for proving linear convergence of first order methods for smooth convex optimization is the strong convexity of the objective function, an assumption which does not hold for many practical applications. In this paper, we derive linear convergence rates of several first order methods for solving smooth non-strongly convex constrained optimization problems, i.e. involving an objective function with a Lipschitz continuous gradient that satisfies some relaxed strong convexity condition. In particular, in the case of smooth constrained convex optimization, we provide several relaxations of the strong convexity conditions and prove that they are sufficient for getting linear convergence for several first order methods such as projected gradient, fast gradient and feasible descent methods. We also provide examples of functional classes that satisfy our proposed relaxations of strong convexity conditions. Finally, we show that the proposed relaxed strong convexity conditions cover important applications ranging from solving linear systems, Linear Programming, and dual formulations of linearly constrained convex problems.Comment: 36 pages, 4 figure

    Restarting Frank-Wolfe: Faster Rates Under H\"olderian Error Bounds

    Full text link
    Conditional Gradients (aka Frank-Wolfe algorithms) form a classical set of methods for constrained smooth convex minimization due to their simplicity, the absence of projection steps, and competitive numerical performance. While the vanilla Frank-Wolfe algorithm only ensures a worst-case rate of O(1/Ï”)\mathcal{O}(1/\epsilon), various recent results have shown that for strongly convex functions on polytopes, the method can be slightly modified to achieve linear convergence. However, this still leaves a huge gap between sublinear O(1/Ï”)\mathcal{O}(1/\epsilon) convergence and linear O(log⁥1/Ï”)\mathcal{O}(\log 1/\epsilon) convergence to reach an Ï”\epsilon-approximate solution. Here, we present a new variant of Conditional Gradients, that can dynamically adapt to the function's geometric properties using restarts and smoothly interpolates between the sublinear and linear regimes. Similarly, the vanilla Frank-Wolfe method has a O(1/Ï”)\mathcal{O}(1/\sqrt{\epsilon}) convergence rate when both constraint set and objective function are strongly convex. We show that relaxing the strong convexity assumption interpolates between various known rates and that similar interpolated convergence rates are obtained when strong convexity of the constraint set is relaxed to uniform convexity.Comment: Journal versio

    Convolutional Dictionary Learning: Acceleration and Convergence

    Full text link
    Convolutional dictionary learning (CDL or sparsifying CDL) has many applications in image processing and computer vision. There has been growing interest in developing efficient algorithms for CDL, mostly relying on the augmented Lagrangian (AL) method or the variant alternating direction method of multipliers (ADMM). When their parameters are properly tuned, AL methods have shown fast convergence in CDL. However, the parameter tuning process is not trivial due to its data dependence and, in practice, the convergence of AL methods depends on the AL parameters for nonconvex CDL problems. To moderate these problems, this paper proposes a new practically feasible and convergent Block Proximal Gradient method using a Majorizer (BPG-M) for CDL. The BPG-M-based CDL is investigated with different block updating schemes and majorization matrix designs, and further accelerated by incorporating some momentum coefficient formulas and restarting techniques. All of the methods investigated incorporate a boundary artifacts removal (or, more generally, sampling) operator in the learning model. Numerical experiments show that, without needing any parameter tuning process, the proposed BPG-M approach converges more stably to desirable solutions of lower objective values than the existing state-of-the-art ADMM algorithm and its memory-efficient variant do. Compared to the ADMM approaches, the BPG-M method using a multi-block updating scheme is particularly useful in single-threaded CDL algorithm handling large datasets, due to its lower memory requirement and no polynomial computational complexity. Image denoising experiments show that, for relatively strong additive white Gaussian noise, the filters learned by BPG-M-based CDL outperform those trained by the ADMM approach.Comment: 21 pages, 7 figures, submitted to IEEE Transactions on Image Processin

    Convex-Set–Constrained Sparse Signal Recovery: Theory and Applications

    Get PDF
    Convex-set constrained sparse signal reconstruction facilitates flexible measurement model and accurate recovery. The objective function that we wish to minimize is a sum of a convex differentiable data-fidelity (negative log-likelihood (NLL)) term and a convex regularization term. We apply sparse signal regularization where the signal belongs to a closed convex set within the closure of the domain of the NLL. Signal sparsity is imposed using the l1-norm penalty on the signal\u27s linear transform coefficients. First, we present a projected Nesterov’s proximal-gradient (PNPG) approach that employs a projected Nesterov\u27s acceleration step with restart and a duality-based inner iteration to compute the proximal mapping. We propose an adaptive step-size selection scheme to obtain a good local majorizing function of the NLL and reduce the time spent backtracking. We present an integrated derivation of the momentum acceleration and proofs of O(k^(-2)) objective function convergence rate and convergence of the iterates, which account for adaptive step size, inexactness of the iterative proximal mapping, and the convex-set constraint. The tuning of PNPG is largely application independent. Tomographic and compressed-sensing reconstruction experiments with Poisson generalized linear and Gaussian linear measurement models demonstrate the performance of the proposed approach. We then address the problem of upper-bounding the regularization constant for the convex-set--constrained sparse signal recovery problem behind the PNPG framework. This bound defines the maximum influence the regularization term has to the signal recovery. We formulate an optimization problem for finding these bounds when the regularization term can be globally minimized and develop an alternating direction method of multipliers (ADMM) type method for their computation. Simulation examples show that the derived and empirical bounds match. Finally, we show application of the PNPG framework to X-ray computed tomography (CT) and outline a method for sparse image reconstruction from Poisson-distributed polychromatic X-ray CT measurements under the blind scenario where the material of the inspected object and the incident energy spectrum are unknown. To obtain a parsimonious mean measurement-model parameterization, we first rewrite the measurement equation by changing the integral variable from photon energy to mass attenuation, which allows us to combine the variations brought by the unknown incident spectrum and mass attenuation into a single unknown mass-attenuation spectrum function; the resulting measurement equation has the Laplace integral form. We apply a block coordinate-descent algorithm that alternates between an NPG image reconstruction step and a limited-memory BFGS with box constraints (L-BFGS-B) iteration for updating mass-attenuation spectrum parameters. Our NPG-BFGS algorithm is the first physical-model based image reconstruction method for simultaneous blind sparse image reconstruction and mass-attenuation spectrum estimation from polychromatic measurements. Real X-ray CT reconstruction examples demonstrate the performance of the proposed blind scheme
    • 

    corecore