2,247 research outputs found
Deriving and improving CMA-ES with Information geometric trust regions
CMA-ES is one of the most popular stochastic search algorithms.
It performs favourably in many tasks without the need of extensive
parameter tuning. The algorithm has many beneficial properties,
including automatic step-size adaptation, efficient covariance updates
that incorporates the current samples as well as the evolution
path and its invariance properties. Its update rules are composed
of well established heuristics where the theoretical foundations of
some of these rules are also well understood. In this paper we
will fully derive all CMA-ES update rules within the framework of
expectation-maximisation-based stochastic search algorithms using
information-geometric trust regions. We show that the use of the trust
region results in similar updates to CMA-ES for the mean and the
covariance matrix while it allows for the derivation of an improved
update rule for the step-size. Our new algorithm, Trust-Region Covariance
Matrix Adaptation Evolution Strategy (TR-CMA-ES) is
fully derived from first order optimization principles and performs
favourably in compare to standard CMA-ES algorithm
Convergence of the Continuous Time Trajectories of Isotropic Evolution Strategies on Monotonic C^2-composite Functions
The Information-Geometric Optimization (IGO) has been introduced as a unified
framework for stochastic search algorithms. Given a parametrized family of
probability distributions on the search space, the IGO turns an arbitrary
optimization problem on the search space into an optimization problem on the
parameter space of the probability distribution family and defines a natural
gradient ascent on this space. From the natural gradients defined over the
entire parameter space we obtain continuous time trajectories which are the
solutions of an ordinary differential equation (ODE). Via discretization, the
IGO naturally defines an iterated gradient ascent algorithm. Depending on the
chosen distribution family, the IGO recovers several known algorithms such as
the pure rank-\mu update CMA-ES. Consequently, the continuous time
IGO-trajectory can be viewed as an idealization of the original algorithm. In
this paper we study the continuous time trajectories of the IGO given the
family of isotropic Gaussian distributions. These trajectories are a
deterministic continuous time model of the underlying evolution strategy in the
limit for population size to infinity and change rates to zero. On functions
that are the composite of a monotone and a convex-quadratic function, we prove
the global convergence of the solution of the ODE towards the global optimum.
We extend this result to composites of monotone and twice continuously
differentiable functions and prove local convergence towards local optima.Comment: PPSN - 12th International Conference on Parallel Problem Solving from
Nature - 2012 (2012
Cooperative Coevolution for Non-Separable Large-Scale Black-Box Optimization: Convergence Analyses and Distributed Accelerations
Given the ubiquity of non-separable optimization problems in real worlds, in
this paper we analyze and extend the large-scale version of the well-known
cooperative coevolution (CC), a divide-and-conquer optimization framework, on
non-separable functions. First, we reveal empirical reasons of why
decomposition-based methods are preferred or not in practice on some
non-separable large-scale problems, which have not been clearly pointed out in
many previous CC papers. Then, we formalize CC to a continuous game model via
simplification, but without losing its essential property. Different from
previous evolutionary game theory for CC, our new model provides a much simpler
but useful viewpoint to analyze its convergence, since only the pure Nash
equilibrium concept is needed and more general fitness landscapes can be
explicitly considered. Based on convergence analyses, we propose a hierarchical
decomposition strategy for better generalization, as for any decomposition
there is a risk of getting trapped into a suboptimal Nash equilibrium. Finally,
we use powerful distributed computing to accelerate it under the multi-level
learning framework, which combines the fine-tuning ability from decomposition
with the invariance property of CMA-ES. Experiments on a set of
high-dimensional functions validate both its search performance and scalability
(w.r.t. CPU cores) on a clustering computing platform with 400 CPU cores
- …