13,314 research outputs found
Hyper-parameter tuning for the (1+ (λ, λ)) GA
It is known that the (1 + (λ, λ)) Genetic Algorithm (GA) with self-adjusting parameter choices achieves a linear expected optimization time on OneMax if its hyper-parameters are suitably chosen. However, it is not very well understood how the hyper-parameter settings influences the overall performance of the (1 + (λ, λ)) GA. Analyzing such multi-dimensional dependencies precisely is at the edge of what running time analysis can offer. To make a step forward on this question, we present an in-depth empirical study of the self-adjusting (1 + (λ, λ)) GA and its hyper-parameters. We show, among many other results, that a 15% reduction of the average running time is possible by a slightly different setup, which allows non-identical offspring population sizes of mutation and crossover phase, and more flexibility in the choice of mutation rate and crossover bias --- a generalization which may be of independent interest. We also show indication that the parametrization of mutation rate and crossover bias derived by theoretical means for the static variant of the (1 + (λ, λ)) GA extends to the non-static case.Postprin
Assessing hyper parameter optimization and speedup for convolutional neural networks
The increased processing power of graphical processing units (GPUs) and the availability of large image datasets has fostered a renewed interest in extracting semantic information from images. Promising results for complex image categorization problems have been achieved using deep learning, with neural networks comprised of many layers. Convolutional neural networks (CNN) are one such architecture which provides more opportunities for image classification. Advances in CNN enable the development of training models using large labelled image datasets, but the hyper parameters need to be specified, which is challenging and complex due to the large number of parameters. A substantial amount of computational power and processing time is required to determine the optimal hyper parameters to define a model yielding good results. This article provides a survey of the hyper parameter search and optimization methods for CNN architectures
Multi-fidelity optimization via surrogate modelling
This paper demonstrates the application of correlated Gaussian process based
approximations to optimization where multiple levels of analysis are available, using an
extension to the geostatistical method of co-kriging. An exchange algorithm is used to
choose which points of the search space to sample within each level of analysis. The
derivation of the co-kriging equations is presented in an intuitive manner, along with a new
variance estimator to account for varying degrees of computational ‘noise’ in the multiple
levels of analysis. A multi-fidelity wing optimization is used to demonstrate the
methodology
Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients
Recent work has established an empirically successful framework for adapting
learning rates for stochastic gradient descent (SGD). This effectively removes
all needs for tuning, while automatically reducing learning rates over time on
stationary problems, and permitting learning rates to grow appropriately in
non-stationary tasks. Here, we extend the idea in three directions, addressing
proper minibatch parallelization, including reweighted updates for sparse or
orthogonal gradients, improving robustness on non-smooth loss functions, in the
process replacing the diagonal Hessian estimation procedure that may not always
be available by a robust finite-difference approximation. The final algorithm
integrates all these components, has linear complexity and is hyper-parameter
free.Comment: Published at the First International Conference on Learning
Representations (ICLR-2013). Public reviews are available at
http://openreview.net/document/c14f2204-fd66-4d91-bed4-153523694041#c14f2204-fd66-4d91-bed4-15352369404
- …