555 research outputs found
Practical recommendations for gradient-based training of deep architectures
Learning algorithms related to artificial neural networks and in particular
for Deep Learning may seem to involve many bells and whistles, called
hyper-parameters. This chapter is meant as a practical guide with
recommendations for some of the most commonly used hyper-parameters, in
particular in the context of learning algorithms based on back-propagated
gradient and gradient-based optimization. It also discusses how to deal with
the fact that more interesting results can be obtained when allowing one to
adjust many hyper-parameters. Overall, it describes elements of the practice
used to successfully and efficiently train and debug large-scale and often deep
multi-layer neural networks. It closes with open questions about the training
difficulties observed with deeper architectures
The Impact of Asynchrony on Parallel Model-Based EAs
In a parallel EA one can strictly adhere to the generational clock, and wait
for all evaluations in a generation to be done. However, this idle time limits
the throughput of the algorithm and wastes computational resources.
Alternatively, an EA can be made asynchronous parallel. However, EAs using
classic recombination and selection operators (GAs) are known to suffer from an
evaluation time bias, which also influences the performance of the approach.
Model-Based Evolutionary Algorithms (MBEAs) are more scalable than classic GAs
by virtue of capturing the structure of a problem in a model. If this model is
learned through linkage learning based on the population, the learned model may
also capture biases. Thus, if an asynchronous parallel MBEA is also affected by
an evaluation time bias, this could result in learned models to be less suited
to solving the problem, reducing performance. Therefore, in this work, we study
the impact and presence of evaluation time biases on MBEAs in an asynchronous
parallelization setting, and compare this to the biases in GAs. We find that a
modern MBEA, GOMEA, is unaffected by evaluation time biases, while the more
classical MBEA, ECGA, is affected, much like GAs are.Comment: 9 pages, 3 figures, 3 tables, submitted to GECCO 202
- …