32,537 research outputs found

    Generalized Framework for Nonlinear Acceleration

    Full text link
    Nonlinear acceleration algorithms improve the performance of iterative methods, such as gradient descent, using the information contained in past iterates. However, their efficiency is still not entirely understood even in the quadratic case. In this paper, we clarify the convergence analysis by giving general properties that share several classes of nonlinear acceleration: Anderson acceleration (and variants), quasi-Newton methods (such as Broyden Type-I or Type-II, SR1, DFP, and BFGS) and Krylov methods (Conjugate Gradient, MINRES, GMRES). In particular, we propose a generic family of algorithms that contains all the previous methods and prove its optimal rate of convergence when minimizing quadratic functions. We also propose multi-secants updates for the quasi-Newton methods listed above. We provide a Matlab code implementing the algorithm

    A Fast Anderson-Chebyshev Acceleration for Nonlinear Optimization

    Full text link
    Anderson acceleration (or Anderson mixing) is an efficient acceleration method for fixed point iterations xt+1=G(xt)x_{t+1}=G(x_t), e.g., gradient descent can be viewed as iteratively applying the operation G(x)xαf(x)G(x) \triangleq x-\alpha\nabla f(x). It is known that Anderson acceleration is quite efficient in practice and can be viewed as an extension of Krylov subspace methods for nonlinear problems. In this paper, we show that Anderson acceleration with Chebyshev polynomial can achieve the optimal convergence rate O(κln1ϵ)O(\sqrt{\kappa}\ln\frac{1}{\epsilon}), which improves the previous result O(κln1ϵ)O(\kappa\ln\frac{1}{\epsilon}) provided by (Toth and Kelley, 2015) for quadratic functions. Moreover, we provide a convergence analysis for minimizing general nonlinear problems. Besides, if the hyperparameters (e.g., the Lipschitz smooth parameter LL) are not available, we propose a guessing algorithm for guessing them dynamically and also prove a similar convergence rate. Finally, the experimental results demonstrate that the proposed Anderson-Chebyshev acceleration method converges significantly faster than other algorithms, e.g., vanilla gradient descent (GD), Nesterov's Accelerated GD. Also, these algorithms combined with the proposed guessing algorithm (guessing the hyperparameters dynamically) achieve much better performance.Comment: To appear in AISTATS 202

    Composite Optimization by Nonconvex Majorization-Minimization

    Full text link
    The minimization of a nonconvex composite function can model a variety of imaging tasks. A popular class of algorithms for solving such problems are majorization-minimization techniques which iteratively approximate the composite nonconvex function by a majorizing function that is easy to minimize. Most techniques, e.g. gradient descent, utilize convex majorizers in order to guarantee that the majorizer is easy to minimize. In our work we consider a natural class of nonconvex majorizers for these functions, and show that these majorizers are still sufficient for a globally convergent optimization scheme. Numerical results illustrate that by applying this scheme, one can often obtain superior local optima compared to previous majorization-minimization methods, when the nonconvex majorizers are solved to global optimality. Finally, we illustrate the behavior of our algorithm for depth super-resolution from raw time-of-flight data.Comment: 38 pages, 12 figures, accepted for publication in SIIM

    Supervised Descent Method for Solving Nonlinear Least Squares Problems in Computer Vision

    Full text link
    Many computer vision problems (e.g., camera calibration, image alignment, structure from motion) are solved with nonlinear optimization methods. It is generally accepted that second order descent methods are the most robust, fast, and reliable approaches for nonlinear optimization of a general smooth function. However, in the context of computer vision, second order descent methods have two main drawbacks: (1) the function might not be analytically differentiable and numerical approximations are impractical, and (2) the Hessian may be large and not positive definite. To address these issues, this paper proposes generic descent maps, which are average "descent directions" and rescaling factors learned in a supervised fashion. Using generic descent maps, we derive a practical algorithm - Supervised Descent Method (SDM) - for minimizing Nonlinear Least Squares (NLS) problems. During training, SDM learns a sequence of decent maps that minimize the NLS. In testing, SDM minimizes the NLS objective using the learned descent maps without computing the Jacobian or the Hessian. We prove the conditions under which the SDM is guaranteed to converge. We illustrate the effectiveness and accuracy of SDM in three computer vision problems: rigid image alignment, non-rigid image alignment, and 3D pose estimation. In particular, we show how SDM achieves state-of-the-art performance in the problem of facial feature detection. The code has been made available at www.humansensing.cs.cmu.edu/intraface.Comment: 15 pages. In submission to TPAM

    Gauss-Newton Optimization for Phase Recovery from the Bispectrum

    Full text link
    Phase recovery from the bispectrum is a central problem in speckle interferometry which can be posed as an optimization problem minimizing a weighted nonlinear least-squares objective function. We look at two different formulations of the phase recovery problem from the literature, both of which can be minimized with respect to either the recovered phase or the recovered image. Previously, strategies for solving these formulations have been limited to first-order optimization methods such as gradient descent or quasi-Newton methods. This paper explores Gauss-Newton optimization schemes for the problem of phase recovery from the bispectrum. We implement efficient Gauss-Newton optimization schemes for all the formulations. For the two of these formulations which optimize with respect to the recovered image, we also extend to projected Gauss-Newton to enforce element-wise lower and upper bounds on the pixel intensities of the recovered image. We show that our efficient Gauss-Newton schemes result in better image reconstructions with no or limited additional computational cost compared to previously implemented first-order optimization schemes for phase recovery from the bispectrum. MATLAB implementations of all methods and simulations are made publicly available in the BiBox repository on Github.Comment: 13 pages, 4 figures, 2 table

    The proximal point method revisited

    Full text link
    In this short survey, I revisit the role of the proximal point method in large scale optimization. I focus on three recent examples: a proximally guided subgradient method for weakly convex stochastic approximation, the prox-linear algorithm for minimizing compositions of convex functions and smooth maps, and Catalyst generic acceleration for regularized Empirical Risk Minimization.Comment: 11 pages, submitted to SIAG/OPT Views and New

    Optimizing Schroedinger functionals using Sobolev gradients: Applications to Quantum Mechanics and Nonlinear Optics

    Full text link
    In this paper we study the application of the Sobolev gradients technique to the problem of minimizing several Schr\"odinger functionals related to timely and difficult nonlinear problems in Quantum Mechanics and Nonlinear Optics. We show that these gradients act as preconditioners over traditional choices of descent directions in minimization methods and show a computationally inexpensive way to obtain them using a discrete Fourier basis and a Fast Fourier Transform. We show that the Sobolev preconditioning provides a great convergence improvement over traditional techniques for finding solutions with minimal energy as well as stationary states and suggest a generalization of the method using arbitrary linear operators.Comment: 19 pages with 5 postscript figure

    TMAC: A Toolbox of Modern Async-Parallel, Coordinate, Splitting, and Stochastic Methods

    Full text link
    TMAC is a toolbox written in C++11 that implements algorithms based on a set of modern methods for large-scale optimization. It covers a variety of optimization problems, which can be both smooth and nonsmooth, convex and nonconvex, as well as constrained and unconstrained. The algorithms implemented in TMAC, such as the coordinate up- date method and operator splitting method, are scalable as they decompose a problem into simple subproblems. These algorithms can run in a multi-threaded fashion, either synchronously or asynchronously, to take advantages of all the cores available. TMAC architecture mimics how a scientist writes down an optimization algorithm. Therefore, it is easy for one to obtain a new algorithm by making simple modifications such as adding a new operator and adding a new splitting, while maintaining the multicore parallelism and other features. The package is available at https://github.com/uclaopt/TMAC

    Smoothing L2\mathcal{L}^2 gradients in iterative regularization

    Full text link
    Connected with the rise of interest in inverse problems is the development and analysis of regularization methods, which are a necessity due to the ill-posedness of inverse problems. Tikhonov-type regularization methods are very popular in this regard. However, its direct implementation for large-scale linear or non-linear problems is a non-trivial task. In such scenarios, iterative regularization methods usually serve as a better alternative. In this paper we propose a new iterative regularization method which uses descent directions, different from the usual gradient direction, that enable a more smoother and effective recovery than the later. This is achieved by transforming the original noisy gradient, via a smoothing operator, to a smoother gradient, which is more robust to the noise present in the data. It is also shown that this technique is very beneficial when dealing with data having large noise level. To illustrate the computational efficiency of this method we apply it to numerically solve some classical integral inverse problems, including image deblurring and tomography problems, and compare the results with certain standard regularization methods, such as Tikhonov, TV, CGLS, etc.Comment: Comments are welcomed. arXiv admin note: text overlap with arXiv:1906.0547

    Detecting and correcting the loss of independence in nonlinear conjugate gradient

    Full text link
    It is well known that search directions in nonlinear conjugate gradient (CG) can sometimes become nearly dependent, causing a dramatic slow-down in the convergence rate. We provide a theoretical analysis of this loss of independence. The analysis applies to the case of a strictly convex objective function and is motivated by older work of Nemirovsky and Yudin. Loss of independence can affect several of the well-known variants of nonlinear CG including Fletcher-Reeves, Polak-Ribi\`ere (nonnegative variant), and Hager-Zhang. Based on our analysis, we propose a relatively inexpensive computational test for detecting loss of independence. We also propose a method for correcting it when it is detected, which we call "subspace optimization." Although the correction method is somewhat expensive, our experiments show that in some cases, usually the most ill-conditioned ones, it yields a method much faster than any of these three variants. Even though our theory covers only strongly convex objective functions, we provide computational results to indicate that the detection and correction mechanisms may also hold promise for nonconvex optimization
    corecore