Search CORE

35,627 research outputs found

Gradient Descent: The Ultimate Optimizer

Author: Andow Samantha
Arroyo-Fang Emilio
Chandra Kartik
Dea Irene
George Johann
Grueter Melissa
Hosmer Basil
Meijer Erik
Stumpos Steffi
Tempest Alanna
Yang Shannon
Publication venue
Publication date: 29/09/2019
Field of study

Working with any gradient-based machine learning algorithm involves the tedious task of tuning the optimizer's hyperparameters, such as the learning rate. There exist many techniques for automated hyperparameter optimization, but they typically introduce even more hyperparameters to control the hyperparameter optimization process. We propose to instead learn the hyperparameters themselves by gradient descent, and furthermore to learn the hyper-hyperparameters by gradient descent as well, and so on ad infinitum. As these towers of gradient-based optimizers grow, they become significantly less sensitive to the choice of top-level hyperparameters, hence decreasing the burden on the user to search for optimal values

arXiv.org e-Print Archive

Online Hyperparameter Meta-Learning with Hypergradient Distillation

Author: Hospedales Timothy
Hwang Sung Ju
Lee Hae Beom
Lee Hayeon
Shin JaeWoong
Yang Eunho
Publication venue
Publication date: 11/02/2022
Field of study

Many gradient-based meta-learning methods assume a set of parameters that do not participate in inner-optimization, which can be considered as hyperparameters. Although such hyperparameters can be optimized using the existing gradient-based hyperparameter optimization (HO) methods, they suffer from the following issues. Unrolled differentiation methods do not scale well to high-dimensional hyperparameters or horizon length, Implicit Function Theorem (IFT) based methods are restrictive for online optimization, and short horizon approximations suffer from short horizon bias. In this work, we propose a novel HO method that can overcome these limitations, by approximating the second-order term with knowledge distillation. Specifically, we parameterize a single Jacobian-vector product (JVP) for each HO step and minimize the distance from the true second-order term. Our method allows online optimization and also is scalable to the hyperparameter dimension and the horizon length. We demonstrate the effectiveness of our method on two different meta-learning methods and three benchmark datasets

arXiv.org e-Print Archive

Edinburgh Research Explorer

Exploring the Optimized Value of Each Hyperparameter in Various Gradient Descent Algorithms

Author: Chen Abel C. H.
Publication venue
Publication date: 27/12/2022
Field of study

In the recent years, various gradient descent algorithms including the methods of gradient descent, gradient descent with momentum, adaptive gradient (AdaGrad), root-mean-square propagation (RMSProp) and adaptive moment estimation (Adam) have been applied to the parameter optimization of several deep learning models with higher accuracies or lower errors. These optimization algorithms may need to set the values of several hyperparameters which include a learning rate, momentum coefficients, etc. Furthermore, the convergence speed and solution accuracy may be influenced by the values of hyperparameters. Therefore, this study proposes an analytical framework to use mathematical models for analyzing the mean error of each objective function based on various gradient descent algorithms. Moreover, the suitable value of each hyperparameter could be determined by minimizing the mean error. The principles of hyperparameter value setting have been generalized based on analysis results for model optimization. The experimental results show that higher efficiency convergences and lower errors can be obtained by the proposed method.Comment: in Chinese languag

arXiv.org e-Print Archive

A local Bayesian optimizer for atomic structures

Author: del Río Estefanía Garijo
Jacobsen Karsten W.
Mortensen Jens Jørgen
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2019
Field of study

A local optimization method based on Bayesian Gaussian Processes is developed and applied to atomic structures. The method is applied to a variety of systems including molecules, clusters, bulk materials, and molecules at surfaces. The approach is seen to compare favorably to standard optimization algorithms like conjugate gradient or BFGS in all cases. The method relies on prediction of surrogate potential energy surfaces, which are fast to optimize, and which are gradually improved as the calculation proceeds. The method includes a few hyperparameters, the optimization of which may lead to further improvements of the computational speed.Comment: 10 pages, 5 figure

arXiv.org e-Print Archive

Online Research Database In Technology

Hyperparameter optimization with approximate gradient

Author: Pedregosa Fabian
Publication venue
Publication date: 25/06/2016
Field of study

Most models in machine learning contain at least one hyperparameter to control for model complexity. Choosing an appropriate set of hyperparameters is both crucial in terms of model accuracy and computationally challenging. In this work we propose an algorithm for the optimization of continuous hyperparameters using inexact gradient information. An advantage of this method is that hyperparameters can be updated before model parameters have fully converged. We also give sufficient conditions for the global convergence of this method, based on regularity conditions of the involved functions and summability of errors. Finally, we validate the empirical performance of this method on the estimation of regularization constants of L2-regularized logistic regression and kernel Ridge regression. Empirical benchmarks indicate that our approach is highly competitive with respect to state of the art methods.Comment: Proceedings of the International conference on Machine Learning (ICML

arXiv.org e-Print Archive

CPMLHO:Hyperparameter Tuning via Cutting Plane and Mixed-Level Optimization

Author: Dou Shaoyu
Jiao Yang
Yang Shuo
Zheng Mana
Zhu Chen
Publication venue
Publication date: 11/12/2022
Field of study

The hyperparameter optimization of neural network can be expressed as a bilevel optimization problem. The bilevel optimization is used to automatically update the hyperparameter, and the gradient of the hyperparameter is the approximate gradient based on the best response function. Finding the best response function is very time consuming. In this paper we propose CPMLHO, a new hyperparameter optimization method using cutting plane method and mixed-level objective function.The cutting plane is added to the inner layer to constrain the space of the response function. To obtain more accurate hypergradient,the mixed-level can flexibly adjust the loss function by using the loss of the training set and the verification set. Compared to existing methods, the experimental results show that our method can automatically update the hyperparameters in the training process, and can find more superior hyperparameters with higher accuracy and faster convergence

arXiv.org e-Print Archive