563 research outputs found
Limited Evaluation Cooperative Co-evolutionary Differential Evolution for Large-scale Neuroevolution
Many real-world control and classification tasks involve a large number of
features. When artificial neural networks (ANNs) are used for modeling these
tasks, the network architectures tend to be large. Neuroevolution is an
effective approach for optimizing ANNs; however, there are two bottlenecks that
make their application challenging in case of high-dimensional networks using
direct encoding. First, classic evolutionary algorithms tend not to scale well
for searching large parameter spaces; second, the network evaluation over a
large number of training instances is in general time-consuming. In this work,
we propose an approach called the Limited Evaluation Cooperative
Co-evolutionary Differential Evolution algorithm (LECCDE) to optimize
high-dimensional ANNs.
The proposed method aims to optimize the pre-synaptic weights of each
post-synaptic neuron in different subpopulations using a Cooperative
Co-evolutionary Differential Evolution algorithm, and employs a limited
evaluation scheme where fitness evaluation is performed on a relatively small
number of training instances based on fitness inheritance. We test LECCDE on
three datasets with various sizes, and our results show that cooperative
co-evolution significantly improves the test error comparing to standard
Differential Evolution, while the limited evaluation scheme facilitates a
significant reduction in computing time
Safe Mutations for Deep and Recurrent Neural Networks through Output Gradients
While neuroevolution (evolving neural networks) has a successful track record
across a variety of domains from reinforcement learning to artificial life, it
is rarely applied to large, deep neural networks. A central reason is that
while random mutation generally works in low dimensions, a random perturbation
of thousands or millions of weights is likely to break existing functionality,
providing no learning signal even if some individual weight changes were
beneficial. This paper proposes a solution by introducing a family of safe
mutation (SM) operators that aim within the mutation operator itself to find a
degree of change that does not alter network behavior too much, but still
facilitates exploration. Importantly, these SM operators do not require any
additional interactions with the environment. The most effective SM variant
capitalizes on the intriguing opportunity to scale the degree of mutation of
each individual weight according to the sensitivity of the network's outputs to
that weight, which requires computing the gradient of outputs with respect to
the weights (instead of the gradient of error, as in conventional deep
learning). This safe mutation through gradients (SM-G) operator dramatically
increases the ability of a simple genetic algorithm-based neuroevolution method
to find solutions in high-dimensional domains that require deep and/or
recurrent neural networks (which tend to be particularly brittle to mutation),
including domains that require processing raw pixels. By improving our ability
to evolve deep neural networks, this new safer approach to mutation expands the
scope of domains amenable to neuroevolution
Training binary neural networks without floating point precision
The main goal of this work is to improve the efficiency of training binary
neural networks, which are low latency and low energy networks. The main
contribution of this work is the proposal of two solutions comprised of
topology changes and strategy training that allow the network to achieve near
the state-of-the-art performance and efficient training. The time required for
training and the memory required in the process are two factors that contribute
to efficient training.Comment: 74 pages, Master's thesi
Correspondence between neuroevolution and gradient descent
We show analytically that training a neural network by stochastic mutation or
"neuroevolution" of its weights is equivalent, in the limit of small mutations,
to gradient descent on the loss function in the presence of Gaussian white
noise. Averaged over independent realizations of the learning process,
neuroevolution is equivalent to gradient descent on the loss function. We use
numerical simulation to show that this correspondence can be observed for
finite mutations. Our results provide a connection between two distinct types
of neural-network training, and provide justification for the empirical success
of neuroevolution
- …