Search CORE

12,411 research outputs found

A Three-Term Conjugate Gradient Method with Sufficient Descent Property for Unconstrained Optimization

Author: Hager W. W.
Hiroshi Yabe
John A. Ford
Yabe H.
Yasushi Narushima
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 20/01/2011
Field of study

Conjugate gradient methods are widely used for solving large-scale unconstrained optimization problems, because they do not need the storage of matrices. In this paper, we propose a general form of three-term conjugate gradient methods which always generate a sufficient descent direction. We give a sufficient condition for the global convergence of the proposed general method. Moreover, we present a specific three-term conjugate gradient method based on the multi-step quasi-Newton method. Finally, some numerical results of the proposed method are given

University of Essex Research Repository

Crossref

Adaptive Momentum for Neural Network Optimization

Author: Rashidi Zana
Publication venue
Publication date: 11/05/2020
Field of study

In this thesis, we develop a novel and efficient algorithm for optimizing neural networks inspired by a recently proposed geodesic optimization algorithm. Our algorithm, which we call Stochastic Geodesic Optimization (SGeO), utilizes an adaptive coefficient on top of Polyaks Heavy Ball method effectively controlling the amount of weight put on the previous update to the parameters based on the change of direction in the optimization path. Experimental results on strongly convex functions with Lipschitz gradients and deep Autoencoder benchmarks show that SGeO reaches lower errors than established first-order methods and competes well with lower or similar errors to a recent second-order method called K-FAC (Kronecker-Factored Approximate Curvature). We also incorporate Nesterov style lookahead gradient into our algorithm (SGeO-N) and observe notable improvements. We believe that our research will open up new directions for high-dimensional neural network optimization where combining the efficiency of first-order methods and the effectiveness of second-order methods proves a promising avenue to explore

YorkSpace

Convergence of Gradient Descent for Low-Rank Matrix Approximation

Author: Dai W
Pitaval R-A
Tirkkonen O
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

This paper provides a proof of global convergence of gradient search for low-rank matrix approximation. Such approximations have recently been of interest for large-scale problems, as well as for dictionary learning for sparse signal representations and matrix completion. The proof is based on the interpretation of the problem as an optimization on the Grassmann manifold and Fubiny-Study distance on this space

CiteSeerX

Spiral - Imperial College Digital Repository

Do optimization methods in deep learning applications matter?

Author: Kiran Mariam
Ozyildirim Buse Melis
Publication venue: eScholarship, University of California
Publication date: 28/02/2020
Field of study

With advances in deep learning, exponential data growth and increasing model complexity, developing efficient optimization methods are attracting much research attention. Several implementations favor the use of Conjugate Gradient (CG) and Stochastic Gradient Descent (SGD) as being practical and elegant solutions to achieve quick convergence, however, these optimization processes also present many limitations in learning across deep learning applications. Recent research is exploring higher-order optimization functions as better approaches, but these present very complex computational challenges for practical use. Comparing first and higher-order optimization functions, in this paper, our experiments reveal that Levemberg-Marquardt (LM) significantly supersedes optimal convergence but suffers from very large processing time increasing the training complexity of both, classification and reinforcement learning problems. Our experiments compare off-the-shelf optimization functions(CG, SGD, LM and L-BFGS) in standard CIFAR, MNIST, CartPole and FlappyBird experiments.The paper presents arguments on which optimization functions to use and further, which functions would benefit from parallelization efforts to improve pretraining time and learning rate convergence

arXiv.org e-Print Archive

eScholarship - University of California

CoCoA: A General Framework for Communication-Efficient Distributed Optimization

Author: Forte Simone
Jaggi Martin
Jordan Michael I.
Ma Chenxin
Smith Virginia
Takac Martin
Publication venue
Publication date: 21/06/2017
Field of study

The scale of modern datasets necessitates the development of efficient distributed optimization methods for machine learning. We present a general-purpose framework for distributed computing environments, CoCoA, that has an efficient communication scheme and is applicable to a wide variety of problems in machine learning and signal processing. We extend the framework to cover general non-strongly-convex regularizers, including L1-regularized problems like lasso, sparse logistic regression, and elastic net regularization, and show how earlier work can be derived as a special case. We provide convergence guarantees for the class of convex regularized loss minimization objectives, leveraging a novel approach in handling non-strongly-convex regularizers and non-smooth loss functions. The resulting framework has markedly improved performance over state-of-the-art methods, as we illustrate with an extensive set of experiments on real distributed datasets

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Repository for Publications and Research Data

On Algorithms Based on Joint Estimation of Currents and Contrast in Microwave Tomography

Author: Barrière Paul-André
Goussard Yves
Idier Jérôme
Laurin Jean-Jacques
Publication venue
Publication date: 01/01/2009
Field of study

This paper deals with improvements to the contrast source inversion method which is widely used in microwave tomography. First, the method is reviewed and weaknesses of both the criterion form and the optimization strategy are underlined. Then, two new algorithms are proposed. Both of them are based on the same criterion, similar but more robust than the one used in contrast source inversion. The first technique keeps the main characteristics of the contrast source inversion optimization scheme but is based on a better exploitation of the conjugate gradient algorithm. The second technique is based on a preconditioned conjugate gradient algorithm and performs simultaneous updates of sets of unknowns that are normally processed sequentially. Both techniques are shown to be more efficient than original contrast source inversion.Comment: 12 pages, 12 figures, 5 table

arXiv.org e-Print Archive

PolyPublie

Computation of Ground States of the Gross-Pitaevskii Functional via Riemannian Optimization

Author: Danaila Ionut
Protas Bartosz
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2017
Field of study

In this paper we combine concepts from Riemannian Optimization and the theory of Sobolev gradients to derive a new conjugate gradient method for direct minimization of the Gross-Pitaevskii energy functional with rotation. The conservation of the number of particles constrains the minimizers to lie on a manifold corresponding to the unit

L^2

norm. The idea developed here is to transform the original constrained optimization problem to an unconstrained problem on this (spherical) Riemannian manifold, so that fast minimization algorithms can be applied as alternatives to more standard constrained formulations. First, we obtain Sobolev gradients using an equivalent definition of an

H^1

inner product which takes into account rotation. Then, the Riemannian gradient (RG) steepest descent method is derived based on projected gradients and retraction of an intermediate solution back to the constraint manifold. Finally, we use the concept of the Riemannian vector transport to propose a Riemannian conjugate gradient (RCG) method for this problem. It is derived at the continuous level based on the "optimize-then-discretize" paradigm instead of the usual "discretize-then-optimize" approach, as this ensures robustness of the method when adaptive mesh refinement is performed in computations. We evaluate various design choices inherent in the formulation of the method and conclude with recommendations concerning selection of the best options. Numerical tests demonstrate that the proposed RCG method outperforms the simple gradient descent (RG) method in terms of rate of convergence. While on simple problems a Newton-type method implemented in the {\tt Ipopt} library exhibits a faster convergence than the (RCG) approach, the two methods perform similarly on more complex problems requiring the use of mesh adaptation. At the same time the (RCG) approach has far fewer tunable parameters.Comment: 28 pages, 13 figure

arXiv.org e-Print Archive

HAL - Normandie Université

HAL Descartes

Hal-Diderot