20,464 research outputs found
Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments
In the NIPS 2017 Learning to Run challenge, participants were tasked with
building a controller for a musculoskeletal model to make it run as fast as
possible through an obstacle course. Top participants were invited to describe
their algorithms. In this work, we present eight solutions that used deep
reinforcement learning approaches, based on algorithms such as Deep
Deterministic Policy Gradient, Proximal Policy Optimization, and Trust Region
Policy Optimization. Many solutions use similar relaxations and heuristics,
such as reward shaping, frame skipping, discretization of the action space,
symmetry, and policy blending. However, each of the eight teams implemented
different modifications of the known algorithms.Comment: 27 pages, 17 figure
Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning
Reinforcement learning (RL) algorithms for real-world robotic applications
need a data-efficient learning process and the ability to handle complex,
unknown dynamical systems. These requirements are handled well by model-based
and model-free RL approaches, respectively. In this work, we aim to combine the
advantages of these two types of methods in a principled manner. By focusing on
time-varying linear-Gaussian policies, we enable a model-based algorithm based
on the linear quadratic regulator (LQR) that can be integrated into the
model-free framework of path integral policy improvement (PI2). We can further
combine our method with guided policy search (GPS) to train arbitrary
parameterized policies such as deep neural networks. Our simulation and
real-world experiments demonstrate that this method can solve challenging
manipulation tasks with comparable or better performance than model-free
methods while maintaining the sample efficiency of model-based methods. A video
presenting our results is available at
https://sites.google.com/site/icml17pilqrComment: Paper accepted to the International Conference on Machine Learning
(ICML) 201
Paired Comparisons-based Interactive Differential Evolution
We propose Interactive Differential Evolution (IDE) based on paired
comparisons for reducing user fatigue and evaluate its convergence speed in
comparison with Interactive Genetic Algorithms (IGA) and tournament IGA. User
interface and convergence performance are two big keys for reducing Interactive
Evolutionary Computation (IEC) user fatigue. Unlike IGA and conventional IDE,
users of the proposed IDE and tournament IGA do not need to compare whole
individuals each other but compare pairs of individuals, which largely
decreases user fatigue. In this paper, we design a pseudo-IEC user and evaluate
another factor, IEC convergence performance, using IEC simulators and show that
our proposed IDE converges significantly faster than IGA and tournament IGA,
i.e. our proposed one is superior to others from both user interface and
convergence performance points of view
A Practical Guide to Robust Optimization
Robust optimization is a young and active research field that has been mainly
developed in the last 15 years. Robust optimization is very useful for
practice, since it is tailored to the information at hand, and it leads to
computationally tractable formulations. It is therefore remarkable that
real-life applications of robust optimization are still lagging behind; there
is much more potential for real-life applications than has been exploited
hitherto. The aim of this paper is to help practitioners to understand robust
optimization and to successfully apply it in practice. We provide a brief
introduction to robust optimization, and also describe important do's and
don'ts for using it in practice. We use many small examples to illustrate our
discussions
GPS-ABC: Gaussian Process Surrogate Approximate Bayesian Computation
Scientists often express their understanding of the world through a
computationally demanding simulation program. Analyzing the posterior
distribution of the parameters given observations (the inverse problem) can be
extremely challenging. The Approximate Bayesian Computation (ABC) framework is
the standard statistical tool to handle these likelihood free problems, but
they require a very large number of simulations. In this work we develop two
new ABC sampling algorithms that significantly reduce the number of simulations
necessary for posterior inference. Both algorithms use confidence estimates for
the accept probability in the Metropolis Hastings step to adaptively choose the
number of necessary simulations. Our GPS-ABC algorithm stores the information
obtained from every simulation in a Gaussian process which acts as a surrogate
function for the simulated statistics. Experiments on a challenging realistic
biological problem illustrate the potential of these algorithms
- …