1,342 research outputs found
A Covariance Matrix Adaptation Evolution Strategy for Direct Policy Search in Reproducing Kernel Hilbert Space
The covariance matrix adaptation evolution strategy (CMA-ES) is an efficient derivative-free optimization algorithm. It optimizes a black-box objective function over a well defined parameter space. In some problems, such parameter spaces are defined using function approximation in which feature functions are manually defined. Therefore, the performance of those techniques strongly depends on the quality of chosen features. Hence, enabling CMA-ES to optimize on a more complex and general function class of the objective has long been desired. Specifically, we consider modeling the input space for black-box optimization in reproducing kernel Hilbert spaces (RKHS). This modeling leads to a functional optimization problem whose domain is a function space that enables us to optimize in a very rich function class. In addition, we propose CMA-ES-RKHS, a generalized CMA-ES framework, that performs black-box functional optimization in the RKHS. A search distribution, represented as a Gaussian process, is adapted by updating both its mean function and covariance operator. Adaptive representation of the function and covariance operator is achieved with sparsification techniques. We evaluate CMA-ES-RKHS on a simple functional optimization problem and bench-mark reinforcement learning (RL) domains. For an application in RL, we model policies for MDPs in RKHS and transform a cumulative return objective as a functional of RKHS policies, which can be optimized via CMA-ES-RKHS. This formulation results in a black-box functional policy search framework
Learning to Race through Coordinate Descent Bayesian Optimisation
In the automation of many kinds of processes, the observable outcome can
often be described as the combined effect of an entire sequence of actions, or
controls, applied throughout its execution. In these cases, strategies to
optimise control policies for individual stages of the process might not be
applicable, and instead the whole policy might have to be optimised at once. On
the other hand, the cost to evaluate the policy's performance might also be
high, being desirable that a solution can be found with as few interactions as
possible with the real system. We consider the problem of optimising control
policies to allow a robot to complete a given race track within a minimum
amount of time. We assume that the robot has no prior information about the
track or its own dynamical model, just an initial valid driving example.
Localisation is only applied to monitor the robot and to provide an indication
of its position along the track's centre axis. We propose a method for finding
a policy that minimises the time per lap while keeping the vehicle on the track
using a Bayesian optimisation (BO) approach over a reproducing kernel Hilbert
space. We apply an algorithm to search more efficiently over high-dimensional
policy-parameter spaces with BO, by iterating over each dimension individually,
in a sequential coordinate descent-like scheme. Experiments demonstrate the
performance of the algorithm against other methods in a simulated car racing
environment.Comment: Accepted as conference paper for the 2018 IEEE International
Conference on Robotics and Automation (ICRA
Hilbert Space Embeddings of POMDPs
A nonparametric approach for policy learning for POMDPs is proposed. The
approach represents distributions over the states, observations, and actions as
embeddings in feature spaces, which are reproducing kernel Hilbert spaces.
Distributions over states given the observations are obtained by applying the
kernel Bayes' rule to these distribution embeddings. Policies and value
functions are defined on the feature space over states, which leads to a
feature space expression for the Bellman equation. Value iteration may then be
used to estimate the optimal value function and associated policy. Experimental
results confirm that the correct policy is learned using the feature space
representation.Comment: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty
in Artificial Intelligence (UAI2012
- …