35,627 research outputs found
Gradient Descent: The Ultimate Optimizer
Working with any gradient-based machine learning algorithm involves the
tedious task of tuning the optimizer's hyperparameters, such as the learning
rate. There exist many techniques for automated hyperparameter optimization,
but they typically introduce even more hyperparameters to control the
hyperparameter optimization process. We propose to instead learn the
hyperparameters themselves by gradient descent, and furthermore to learn the
hyper-hyperparameters by gradient descent as well, and so on ad infinitum. As
these towers of gradient-based optimizers grow, they become significantly less
sensitive to the choice of top-level hyperparameters, hence decreasing the
burden on the user to search for optimal values
Online Hyperparameter Meta-Learning with Hypergradient Distillation
Many gradient-based meta-learning methods assume a set of parameters that do
not participate in inner-optimization, which can be considered as
hyperparameters. Although such hyperparameters can be optimized using the
existing gradient-based hyperparameter optimization (HO) methods, they suffer
from the following issues. Unrolled differentiation methods do not scale well
to high-dimensional hyperparameters or horizon length, Implicit Function
Theorem (IFT) based methods are restrictive for online optimization, and short
horizon approximations suffer from short horizon bias. In this work, we propose
a novel HO method that can overcome these limitations, by approximating the
second-order term with knowledge distillation. Specifically, we parameterize a
single Jacobian-vector product (JVP) for each HO step and minimize the distance
from the true second-order term. Our method allows online optimization and also
is scalable to the hyperparameter dimension and the horizon length. We
demonstrate the effectiveness of our method on two different meta-learning
methods and three benchmark datasets
Exploring the Optimized Value of Each Hyperparameter in Various Gradient Descent Algorithms
In the recent years, various gradient descent algorithms including the
methods of gradient descent, gradient descent with momentum, adaptive gradient
(AdaGrad), root-mean-square propagation (RMSProp) and adaptive moment
estimation (Adam) have been applied to the parameter optimization of several
deep learning models with higher accuracies or lower errors. These optimization
algorithms may need to set the values of several hyperparameters which include
a learning rate, momentum coefficients, etc. Furthermore, the convergence speed
and solution accuracy may be influenced by the values of hyperparameters.
Therefore, this study proposes an analytical framework to use mathematical
models for analyzing the mean error of each objective function based on various
gradient descent algorithms. Moreover, the suitable value of each
hyperparameter could be determined by minimizing the mean error. The principles
of hyperparameter value setting have been generalized based on analysis results
for model optimization. The experimental results show that higher efficiency
convergences and lower errors can be obtained by the proposed method.Comment: in Chinese languag
A local Bayesian optimizer for atomic structures
A local optimization method based on Bayesian Gaussian Processes is developed
and applied to atomic structures. The method is applied to a variety of systems
including molecules, clusters, bulk materials, and molecules at surfaces. The
approach is seen to compare favorably to standard optimization algorithms like
conjugate gradient or BFGS in all cases. The method relies on prediction of
surrogate potential energy surfaces, which are fast to optimize, and which are
gradually improved as the calculation proceeds. The method includes a few
hyperparameters, the optimization of which may lead to further improvements of
the computational speed.Comment: 10 pages, 5 figure
Hyperparameter optimization with approximate gradient
Most models in machine learning contain at least one hyperparameter to
control for model complexity. Choosing an appropriate set of hyperparameters is
both crucial in terms of model accuracy and computationally challenging. In
this work we propose an algorithm for the optimization of continuous
hyperparameters using inexact gradient information. An advantage of this method
is that hyperparameters can be updated before model parameters have fully
converged. We also give sufficient conditions for the global convergence of
this method, based on regularity conditions of the involved functions and
summability of errors. Finally, we validate the empirical performance of this
method on the estimation of regularization constants of L2-regularized logistic
regression and kernel Ridge regression. Empirical benchmarks indicate that our
approach is highly competitive with respect to state of the art methods.Comment: Proceedings of the International conference on Machine Learning
(ICML
CPMLHO:Hyperparameter Tuning via Cutting Plane and Mixed-Level Optimization
The hyperparameter optimization of neural network can be expressed as a
bilevel optimization problem. The bilevel optimization is used to automatically
update the hyperparameter, and the gradient of the hyperparameter is the
approximate gradient based on the best response function. Finding the best
response function is very time consuming. In this paper we propose CPMLHO, a
new hyperparameter optimization method using cutting plane method and
mixed-level objective function.The cutting plane is added to the inner layer to
constrain the space of the response function. To obtain more accurate
hypergradient,the mixed-level can flexibly adjust the loss function by using
the loss of the training set and the verification set. Compared to existing
methods, the experimental results show that our method can automatically update
the hyperparameters in the training process, and can find more superior
hyperparameters with higher accuracy and faster convergence
- …