Search CORE

3,714 research outputs found

Model-Based Reinforcement Learning with Continuous States and Actions

Author: Deisenroth MP
Peters J
Rasmussen CE
Publication venue
Publication date: 01/01/2008
Field of study

Finding an optimal policy in a reinforcement learning (RL) framework with continuous state and action spaces is challenging. Approximate solutions are often inevitable. GPDP is an approximate dynamic programming algorithm based on Gaussian process (GP) models for the value functions. In this paper, we extend GPDP to the case of unknown transition dynamics. After building a GP model for the transition dynamics, we apply GPDP to this model and determine a continuous-valued policy in the entire state space. We apply the resulting controller to the underpowered pendulum swing up. Moreover, we compare our results on this RL task to a nearly optimal discrete DP solution in a fully known environment

CiteSeerX

TUbiblio

Spiral - Imperial College Digital Repository

MPG.PuRe

Approximate Dynamic Programming with Gaussian Processes

Author: Deisenroth MP
Peters J
Rasmussen CE
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

In general, it is difficult to determine an optimal closed-loop policy in nonlinear control problems with continuous-valued state and control domains. Hence, approximations are often inevitable. The standard method of discretizing states and controls suffers from the curse of dimensionality and strongly depends on the chosen temporal sampling rate. In this paper, we introduce Gaussian process dynamic programming (GPDP) and determine an approximate globally optimal closed-loop policy. In GPDP, value functions in the Bellman recursion of the dynamic programming algorithm are modeled using Gaussian processes. GPDP returns an optimal statefeedback for a finite set of states. Based on these outcomes, we learn a possibly discontinuous closed-loop policy on the entire state space by switching between two independently trained Gaussian processes. A binary classifier selects one Gaussian process to predict the optimal control signal. We show that GPDP is able to yield an almost optimal solution to an LQ problem using few sample points. Moreover, we successfully apply GPDP to the underpowered pendulum swing up, a complex nonlinear control problem

TUbiblio

CiteSeerX

Crossref

Spiral - Imperial College Digital Repository

MPG.PuRe

An Experimental Evaluation of Bayesian Optimization on Bipedal Locomotion

Author: Calandra R
Deisenroth MP
Peters J
Seyfarth A
Publication venue
Publication date: 10/12/2013
Field of study

© 2014 IEEE.The design of gaits and corresponding control policies for bipedal walkers is a key challenge in robot locomotion. Even when a viable controller parametrization already exists, finding near-optimal parameters can be daunting. The use of automatic gait optimization methods greatly reduces the need for human expertise and time-consuming design processes. Many different approaches to automatic gait optimization have been suggested to date. However, no extensive comparison among them has yet been performed. In this paper, we present some common methods for automatic gait optimization in bipedal locomotion, and analyze their strengths and weaknesses. We experimentally evaluated these gait optimization methods on a bipedal robot, in more than 1800 experimental evaluations. In particular, we analyzed Bayesian optimization in different configurations, including various acquisition functions

Spiral - Imperial College Digital Repository

Multi-Task Policy Search for Robotics

Author: Deisenroth MP
Englert P
Fox D
Peters J
Publication venue
Publication date: 01/01/2014
Field of study

© 2014 IEEE.Learning policies that generalize across multiple tasks is an important and challenging research topic in reinforcement learning and robotics. Training individual policies for every single potential task is often impractical, especially for continuous task variations, requiring more principled approaches to share and transfer knowledge among similar tasks. We present a novel approach for learning a nonlinear feedback policy that generalizes across multiple tasks. The key idea is to define a parametrized policy as a function of both the state and the task, which allows learning a single policy that generalizes across multiple known and unknown tasks. Applications of our novel approach to reinforcement and imitation learning in realrobot experiments are shown

TUbiblio

Crossref

Spiral - Imperial College Digital Repository

MPG.PuRe

Bayesian Gait Optimization for Bipedal Locomotion

Author: Calandra R
Deisenroth MP
Gopalan N
Peters J
Seyfarth A
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

One of the key challenges in robotic bipedal locomotion is finding gait parameters that optimize a desired performance criterion, such as speed, robustness or energy efficiency. Typically, gait optimization requires extensive robot experiments and specific expert knowledge. We propose to apply data-driven machine learning to automate and speed up the process of gait optimization. In particular, we use Bayesian optimization to efficiently find gait parameters that optimize the desired performance metric. As a proof of concept we demonstrate that Bayesian optimization is near-optimal in a classical stochastic optimal control framework. Moreover, we validate our approach to Bayesian gait optimization on a low-cost and fragile real bipedal walker and show that good walking gaits can be efficiently found by Bayesian optimization. © 2014 Springer International Publishing

TUbiblio

CiteSeerX

Spiral - Imperial College Digital Repository

MPG.PuRe

Solving Nonlinear Continuous State-Action-Observation POMDPs for Mechanical Systems with Gaussian Noise

Author: Deisenroth MP
Peters J
Publication venue
Publication date: 01/01/2012
Field of study

TUbiblio

Spiral - Imperial College Digital Repository

MPG.PuRe

Feedback Error Learning for Rhythmic Motor Primitives

Author: Deisenroth MP
Gopalan N
Peters J
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Abstract — Rhythmic motor primitives can be used to learn a variety of oscillatory behaviors from demonstrations or reward signals, e.g., hopping, walking, running and ball-bouncing. However, frequently, such rhythmic motor primitives lead to failures unless a stabilizing controller ensures their functionality, e.g., a balance controller for a walking gait. As an ideal oscillatory behavior requires the stabilizing controller only for exceptions, e.g., to prevent failures, we devise an online learning approach that reduces the dependence on the stabilizing controller. Inspired by related approaches in model learning, we employ the stabilizing controller’s output as a feedback error learning signal for adapting the gait. We demonstrate the resulting approach in two scenarios: a rhythmic arm’s movements and gait adaptation of an underactuated biped. I

CiteSeerX

TUbiblio

Crossref

Spiral - Imperial College Digital Repository

MPG.PuRe

Multi-modal filtering for non-linear estimation

Author: Deisenroth MP
Kamthe S
Peters J
Publication venue
Publication date: 01/01/2014
Field of study

Multi-modal densities appear frequently in time series and practical applications. However, they are not well represented by common state estimators, such as the Extended Kalman Filter and the Unscented Kalman Filter, which additionally suffer from the fact that uncertainty is often not captured sufficiently well. This can result in incoherent and divergent tracking performance. In this paper, we address these issues by devising a non-linear filtering algorithm where densities are represented by Gaussian mixture models, whose parameters are estimated in closed form. The resulting method exhibits a superior performance on nonlinear benchmarks. © 2014 IEEE

TUbiblio

Crossref

UCL Discovery

Spiral - Imperial College Digital Repository

MPG.PuRe

A Survey on Policy Search for Robotics

Author: Deisenroth MP
Neumann G
Peters J
Publication venue: 'Now Publishers'
Publication date: 01/01/2011
Field of study

Policy search is a subfield in reinforcement learning which focuses on finding good parameters for a given policy parametrization. It is well suited for robotics as it can cope with high-dimensional state and action spaces, one of the main challenges in robot learning. We review recent successes of both model-free and model-based policy search in robot learning. Model-free policy search is a general approach to learn policies based on sampled trajectories. We classify model-free methods based on their policy evaluation strategy, policy update strategy, and exploration strategy and present a unified view on existing algorithms. Learning a policy is often easier than learning an accurate forward model, and, hence, model-free methods are more frequently used in practice. However, for each sampled trajectory, it is necessary to interact with the * Both authors contributed equally. robot, which can be time consuming and challenging in practice. Modelbased policy search addresses this problem by first learning a simulator of the robot’s dynamics from data. Subsequently, the simulator generates trajectories that are used for policy learning. For both modelfree and model-based policy search methods, we review their respective properties and their applicability to robotic systems

University of Lincoln Institutional Repository

TUbiblio

Crossref

Spiral - Imperial College Digital Repository

MPG.PuRe

Multi-modal filtering for non-linear estimation

Author: Deisenroth MP
Kamthe S
Peters J
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/07/2014
Field of study

UCL Discovery