5,973 research outputs found
Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles
We present a canonical way to turn any smooth parametric family of
probability distributions on an arbitrary search space into a
continuous-time black-box optimization method on , the
\emph{information-geometric optimization} (IGO) method. Invariance as a design
principle minimizes the number of arbitrary choices. The resulting \emph{IGO
flow} conducts the natural gradient ascent of an adaptive, time-dependent,
quantile-based transformation of the objective function. It makes no
assumptions on the objective function to be optimized.
The IGO method produces explicit IGO algorithms through time discretization.
It naturally recovers versions of known algorithms and offers a systematic way
to derive new ones. The cross-entropy method is recovered in a particular case,
and can be extended into a smoothed, parametrization-independent maximum
likelihood update (IGO-ML). For Gaussian distributions on , IGO
is related to natural evolution strategies (NES) and recovers a version of the
CMA-ES algorithm. For Bernoulli distributions on , we recover the
PBIL algorithm. From restricted Boltzmann machines, we obtain a novel algorithm
for optimization on . All these algorithms are unified under a
single information-geometric optimization framework.
Thanks to its intrinsic formulation, the IGO method achieves invariance under
reparametrization of the search space , under a change of parameters of the
probability distributions, and under increasing transformations of the
objective function.
Theory strongly suggests that IGO algorithms have minimal loss in diversity
during optimization, provided the initial diversity is high. First experiments
using restricted Boltzmann machines confirm this insight. Thus IGO seems to
provide, from information theory, an elegant way to spontaneously explore
several valleys of a fitness landscape in a single run.Comment: Final published versio
Denoising Autoencoders for fast Combinatorial Black Box Optimization
Estimation of Distribution Algorithms (EDAs) require flexible probability
models that can be efficiently learned and sampled. Autoencoders (AE) are
generative stochastic networks with these desired properties. We integrate a
special type of AE, the Denoising Autoencoder (DAE), into an EDA and evaluate
the performance of DAE-EDA on several combinatorial optimization problems with
a single objective. We asses the number of fitness evaluations as well as the
required CPU times. We compare the results to the performance to the Bayesian
Optimization Algorithm (BOA) and RBM-EDA, another EDA which is based on a
generative neural network which has proven competitive with BOA. For the
considered problem instances, DAE-EDA is considerably faster than BOA and
RBM-EDA, sometimes by orders of magnitude. The number of fitness evaluations is
higher than for BOA, but competitive with RBM-EDA. These results show that DAEs
can be useful tools for problems with low but non-negligible fitness evaluation
costs.Comment: corrected typos and small inconsistencie
Learning Dynamic Boltzmann Distributions as Reduced Models of Spatial Chemical Kinetics
Finding reduced models of spatially-distributed chemical reaction networks
requires an estimation of which effective dynamics are relevant. We propose a
machine learning approach to this coarse graining problem, where a maximum
entropy approximation is constructed that evolves slowly in time. The dynamical
model governing the approximation is expressed as a functional, allowing a
general treatment of spatial interactions. In contrast to typical machine
learning approaches which estimate the interaction parameters of a graphical
model, we derive Boltzmann-machine like learning algorithms to estimate
directly the functionals dictating the time evolution of these parameters. By
incorporating analytic solutions from simple reaction motifs, an efficient
simulation method is demonstrated for systems ranging from toy problems to
basic biologically relevant networks. The broadly applicable nature of our
approach to learning spatial dynamics suggests promising applications to
multiscale methods for spatial networks, as well as to further problems in
machine learning
Riemann-Theta Boltzmann Machine
A general Boltzmann machine with continuous visible and discrete integer
valued hidden states is introduced. Under mild assumptions about the connection
matrices, the probability density function of the visible units can be solved
for analytically, yielding a novel parametric density function involving a
ratio of Riemann-Theta functions. The conditional expectation of a hidden state
for given visible states can also be calculated analytically, yielding a
derivative of the logarithmic Riemann-Theta function. The conditional
expectation can be used as activation function in a feedforward neural network,
thereby increasing the modelling capacity of the network. Both the Boltzmann
machine and the derived feedforward neural network can be successfully trained
via standard gradient- and non-gradient-based optimization techniques.Comment: 29 pages, 11 figures, final version published in Neurocomputin
q-Gaussian based Smoothed Functional Algorithm for Stochastic Optimization
The q-Gaussian distribution results from maximizing certain generalizations
of Shannon entropy under some constraints. The importance of q-Gaussian
distributions stems from the fact that they exhibit power-law behavior, and
also generalize Gaussian distributions. In this paper, we propose a Smoothed
Functional (SF) scheme for gradient estimation using q-Gaussian distribution,
and also propose an algorithm for optimization based on the above scheme.
Convergence results of the algorithm are presented. Performance of the proposed
algorithm is shown by simulation results on a queuing model.Comment: 5 pages, 1 figur
- …