Search CORE

13,314 research outputs found

Hyper-parameter tuning for the (1+ (λ, λ)) GA

Author: Cáceres Leslie Pérez
Doerr Benjamin
Hoos Holger
Lengler Johannes
Li Lisha
Pinto Eduardo Carvalho
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/04/2019
Field of study

It is known that the (1 + (λ, λ)) Genetic Algorithm (GA) with self-adjusting parameter choices achieves a linear expected optimization time on OneMax if its hyper-parameters are suitably chosen. However, it is not very well understood how the hyper-parameter settings influences the overall performance of the (1 + (λ, λ)) GA. Analyzing such multi-dimensional dependencies precisely is at the edge of what running time analysis can offer. To make a step forward on this question, we present an in-depth empirical study of the self-adjusting (1 + (λ, λ)) GA and its hyper-parameters. We show, among many other results, that a 15% reduction of the average running time is possible by a slightly different setup, which allows non-identical offspring population sizes of mutation and crossover phase, and more flexibility in the choice of mutation rate and crossover bias --- a generalization which may be of independent interest. We also show indication that the parametrization of mutation rate and crossover bias derived by theoretical means for the static variant of the (1 + (λ, λ)) GA extends to the non-static case.Postprin

arXiv.org e-Print Archive

Crossref

University of St. Andrews - Pure

St Andrews Research Repository

Assessing hyper parameter optimization and speedup for convolutional neural networks

Author: A.Krizhevsky
D. L.Tutorial
E.Bochinski
E.Real
J.Bergstra
J.Deng
K.He
L.Xie
N.Srivastava
S.Ioffe
T.Domhan
W. Y.Lee
Z.Zhong
Publication venue: 'IGI Global'
Publication date: 01/01/2020
Field of study

The increased processing power of graphical processing units (GPUs) and the availability of large image datasets has fostered a renewed interest in extracting semantic information from images. Promising results for complex image categorization problems have been achieved using deep learning, with neural networks comprised of many layers. Convolutional neural networks (CNN) are one such architecture which provides more opportunities for image classification. Advances in CNN enable the development of training models using large labelled image datasets, but the hyper parameters need to be specified, which is challenging and complex due to the large number of parameters. A substantial amount of computational power and processing time is required to determine the optimal hyper parameters to define a model yielding good results. This article provides a survey of the hyper parameter search and optimization methods for CNN architectures

LSBU Research Open

Crossref

ResearchOnline@GCU

Multi-fidelity optimization via surrogate modelling

Author: Forrester Alexander I.J.
Keane Andy J.
Sóbester András
Publication venue: 'The Royal Society'
Publication date: 02/10/2007
Field of study

This paper demonstrates the application of correlated Gaussian process based approximations to optimization where multiple levels of analysis are available, using an extension to the geostatistical method of co-kriging. An exchange algorithm is used to choose which points of the search space to sample within each level of analysis. The derivation of the co-kriging equations is presented in an intuitive manner, along with a new variance estimator to account for varying degrees of computational ‘noise’ in the multiple levels of analysis. A multi-fidelity wing optimization is used to demonstrate the methodology

Southampton (e-Prints Soton)

Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients

Author: LeCun Yann
Schaul Tom
Publication venue
Publication date: 27/03/2013
Field of study

Recent work has established an empirically successful framework for adapting learning rates for stochastic gradient descent (SGD). This effectively removes all needs for tuning, while automatically reducing learning rates over time on stationary problems, and permitting learning rates to grow appropriately in non-stationary tasks. Here, we extend the idea in three directions, addressing proper minibatch parallelization, including reweighted updates for sparse or orthogonal gradients, improving robustness on non-smooth loss functions, in the process replacing the diagonal Hessian estimation procedure that may not always be available by a robust finite-difference approximation. The final algorithm integrates all these components, has linear complexity and is hyper-parameter free.Comment: Published at the First International Conference on Learning Representations (ICLR-2013). Public reviews are available at http://openreview.net/document/c14f2204-fd66-4d91-bed4-153523694041#c14f2204-fd66-4d91-bed4-15352369404

arXiv.org e-Print Archive

CiteSeerX