Search CORE

124 research outputs found

Multi-Task Policy Search for Robotics

Author: Deisenroth MP
Englert P
Fox D
Peters J
Publication venue
Publication date: 01/01/2014
Field of study

© 2014 IEEE.Learning policies that generalize across multiple tasks is an important and challenging research topic in reinforcement learning and robotics. Training individual policies for every single potential task is often impractical, especially for continuous task variations, requiring more principled approaches to share and transfer knowledge among similar tasks. We present a novel approach for learning a nonlinear feedback policy that generalizes across multiple tasks. The key idea is to define a parametrized policy as a function of both the state and the task, which allows learning a single policy that generalizes across multiple known and unknown tasks. Applications of our novel approach to reinforcement and imitation learning in realrobot experiments are shown

TUbiblio

Crossref

Spiral - Imperial College Digital Repository

MPG.PuRe

Multiple-Target Reinforcement Learning with a Single Policy

Author: Deisenroth MP
Fox D
Publication venue
Publication date: 31/07/2011
Field of study

Spiral - Imperial College Digital Repository

Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning

Author: Deisenroth MP
Fox D
Rasmussen CE
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2011
Field of study

Over the last years, there has been substantial progress in robust manipulation in unstructured environments. The long-term goal of our work is to get away from precise, but very expensive robotic systems and to develop affordable, potentially imprecise, self-adaptive manipulator systems that can interactively perform tasks such as playing with children. In this paper, we demonstrate how a low-cost off-the-shelf robotic system can learn closed-loop policies for a stacking task in only a handful of trials-from scratch. Our manipulator is inaccurate and provides no pose feedback. For learning a controller in the work space of a Kinect-style depth camera, we use a model-based reinforcement learning technique. Our learning method is data efficient, reduces model bias, and deals with several noise sources in a principled way during long-term planning. We present a way of incorporating state-space constraints into the learning process and analyze the learning gain by exploiting the sequential structure of the stacking task

CiteSeerX

Spiral - Imperial College Digital Repository

Multi-Task Policy Search

Author: Deisenroth MP
Englert P
Fox D
Peters J
Publication venue
Publication date: 31/12/2013
Field of study

Learning policies that generalize across multiple tasks is an important and challenging research topic in reinforcement learning and robotics. Training individual policies for every single potential task is often impractical, especially for continuous task variations, requiring more principled approaches to share and transfer knowledge among similar tasks. We present a novel approach for learning a nonlinear feedback policy that generalizes across multiple tasks. The key idea is to define a parametrized policy as a function of both the state and the task, which allows learning a single policy that generalizes across multiple known and unknown tasks. Applications of our novel approach to reinforcement and imitation learning in real-robot experiments are shown

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

Online-Computation Approach to Optimal Control of Noise-Affected Nonlinear Systems with Continuous State and Control Spaces

Author: Deisenroth Marc P.
Hanebeck Uwe D.
Ohtsuka Toshiyuki
Weissel Florian
Publication venue
Publication date: 07/05/2013
Field of study

A novel online-computation approach to optimal control of nonlinear, noise-affected systems with continuous state and control spaces is presented. In the proposed algorithm, system noise is explicitly incorporated into the control decision. This leads to superior results compared to state-of-the-art nonlinear controllers that neglect this influence. The solution of an optimal nonlinear controller for a corresponding deterministic system is employed to find a meaningful state space restriction. This restriction is obtained by means of approximate state prediction using the noisy system equation. Within this constrained state space, an optimal closed-loop solution for a finite decisionmaking horizon (prediction horizon) is determined within an adaptively restricted optimization space. Interleaving stochastic dynamic programming and value function approximation yields a solution to the considered optimal control problem. The enhanced performance of the proposed discrete-time controller is illustrated by means of a scalar example system. Nonlinear model predictive control is applied to address approximate treatment of infinite-horizon problems by the finite-horizon controller

KITopen

Bayesian Optimization with Dimension Scheduling: Application to Biological Systems

Author: Baroukh C
Chachuat B
Deisenroth MP
Misener R
Ulmasov D
Publication venue
Publication date: 31/12/2015
Field of study

Bayesian Optimization (BO) is a data-efficient method for global black-box optimization of an expensive-to-evaluate fitness function. BO typically assumes that computation cost of BO is cheap, but experiments are time consuming or costly. In practice, this allows us to optimize ten or fewer critical parameters in up to 1,000 experiments. But experiments may be less expensive than BO methods assume: In some simulation models, we may be able to conduct multiple thousands of experiments in a few hours, and the computational burden of BO is no longer negligible compared to experimentation time. To address this challenge we introduce a new Dimension Scheduling Algorithm (DSA), which reduces the computational burden of BO for many experiments. The key idea is that DSA optimizes the fitness function only along a small set of dimensions at each iteration. This DSA strategy (1) reduces the necessary computation time, (2) finds good solutions faster than the traditional BO method, and (3) can be parallelized straightforwardly. We evaluate the DSA in the context of optimizing parameters of dynamic models of microalgae metabolism and show faster convergence than traditional BO

Spiral - Imperial College Digital Repository

Safe Trajectory Sampling in Model-Based Reinforcement Learning

Author: Bekiroglu Y
Deisenroth MP
Hadjivelichkov D
Kanoulas D
Luo Y
Zwane S
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2023
Field of study

Model-based reinforcement learning aims to learn a policy to solve a target task by leveraging a learned dynamics model. This approach, paired with principled handling of uncertainty allows for data-efficient policy learning in robotics. However, the physical environment has feasibility and safety constraints that need to be incorporated into the policy before it is safe to execute on a real robot. In this work, we study how to enforce the aforementioned constraints in the context of model-based reinforcement learning with probabilistic dynamics models. In particular, we investigate how trajectories sampled from the learned dynamics model can be used on a real robot, while fulfilling user-specified safety requirements. We present a model-based reinforcement learning approach using Gaussian processes where safety constraints are taken into account without simplifying Gaussian assumptions on the predictive state distributions. We evaluate the proposed approach on different continuous control tasks with varying complexity and demonstrate how our safe trajectory-sampling approach can be directly used on a real robot without violating safety constraints

UCL Discovery

Finite-Horizon Optimal State Feedback Control of Nonlinear Stochastic Systems Based on a Minimum Principle

Author: Brunn D
Deisenroth MP
Hanebeck UD
Ohtsuka T
Weissel F
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

In this paper, an approach to the finite-horizon optimal state-feedback control problem of nonlinear, stochastic, discrete-time systems is presented. Starting from the dynamic programming equation, the value function will be approximated by means of Taylor series expansion up to second-order derivatives. Moreover, the problem will be reformulated, such that a minimum principle can be applied to the stochastic problem. Employing this minimum principle, the optimal control problem can be rewritten as a two-point boundary-value problem to be solved at each time step of a shrinking horizon. To avoid numerical problems, the two-point boundary-value problem will be solved by means of a continuation method. Thus, the curse of dimensionality of dynamic programming is avoided, and good candidates for the optimal state-feedback controls are obtained. The proposed approach will be evaluated by means of a scalar example system. © 2006 IEEE

CiteSeerX

Crossref

KITopen

Spiral - Imperial College Digital Repository

MPG.PuRe

Rare germline variants in DNA repair genes and the angiogenesis pathway predispose prostate cancer patients to develop metastatic disease

Author: A Liberzon
A McKenna
AA Al Olama
AG Vinuesa de
AH Ramos
AJ Vickers
B Sharma
C Cybulski
C Cybulski
C Deisenroth
CC Pritchard
Christopher A. Haiman
Clara Cieza-Borrella
CM Ewing
CM Gay
D Leongamornlert
D Li
Daniel A. Leongamornlert
David V. Conti
DC Koboldt
E Castro
E Castro
E Ruark
Edward J. Saunders
FR Schumacher
G Jun
Ian Whitmore
J Karar
J Li
J Niewiarowska
JB Hjelmborg
JI Jun
JL Beebe-Dimmer
K Yumoto
Koveela Govindasami
L Finney
LA Mucci
M Kircher
M Lek
M Martin
M Mongiat
MA Quintana
MA Quintana
Mark N. Brook
Martina Mijuskovic
P Dell’Oglio
PC Sham
R Bhati
R Lozano
R Na
RA Eeles
RD Wood
Rosalind A. Eeles
S Carbon
S Lee
S Purcell
Sarah Wakerell
SI Cunha
SM Gogarten
SN Hart
T Walsh
T Wei
The 1000 Genomes Project Consortium.
Tokhir Dadaev
X Chang
X Liu
X Zhan
X Zheng
Y Gong
Z Kote-Jarai
Z Kote-Jarai
Zsofia Kote-Jarai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2018
Field of study

Background Prostate cancer (PrCa) demonstrates a heterogeneous clinical presentation ranging from largely indolent to lethal. We sought to identify a signature of rare inherited variants that distinguishes between these two extreme phenotypes. Methods We sequenced germline whole exomes from 139 aggressive (metastatic, age of diagnosis < 60) and 141 non-aggressive (low clinical grade, age of diagnosis ≥60) PrCa cases. We conducted rare variant association analyses at gene and gene set levels using SKAT and Bayesian risk index techniques. GO term enrichment analysis was performed for genes with the highest differential burden of rare disruptive variants. Results Protein truncating variants (PTVs) in specific DNA repair genes were significantly overrepresented among patients with the aggressive phenotype, with BRCA2, ATM and NBN the most frequently mutated genes. Differential burden of rare variants was identified between metastatic and non-aggressive cases for several genes implicated in angiogenesis, conferring both deleterious and protective effects. Conclusions Inherited PTVs in several DNA repair genes distinguish aggressive from non-aggressive PrCa cases. Furthermore, inherited variants in genes with roles in angiogenesis may be potential predictors for risk of metastases. If validated in a larger dataset, these findings have potential for future clinical application

Crossref

Institute of Cancer Research Repository

St George's Online Research Archive

A Bayesian Nonparametric Approach to Modeling Motion Patterns

Author: A. Girard
A. Raftery
Albert S. Huang
B. D. Ziebart
C. E. Rasmussen
C. E. Rasmussen
C. J. Paciorek
C. M. Bishop
C. Tay
D. Ashbrook
D. Hsu
D. Patterson
E. Fox
E. Meeds
E. Snelson
Finale Doshi-Velez
H. Dia
H. Kurniawati
J. Ko
J. Letchner
J. M. Joseph
J. Pineau
Joshua Joseph
L. Csató
L. Liao
M. L. Puterman
M. P. Deisenroth
Nicholas Roy
P. Boyle
R. He
S. A. Miller
S. Duane
S. Ross
W. Meiring
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2010
Field of study

The most difficult—and often most essential— aspect of many interception and tracking tasks is constructing motion models of the targets to be found. Experts can often provide only partial information, and fitting parameters for complex motion patterns can require large amounts of training data. Specifying how to parameterize complex motion patterns is in itself a difficult task. In contrast, nonparametric models are very flexible and generalize well with relatively little training data. We propose modeling target motion patterns as a mixture of Gaussian processes (GP) with a Dirichlet process (DP) prior over mixture weights. The GP provides a flexible representation for each individual motion pattern, while the DP assigns observed trajectories to particular motion patterns. Both automatically adjust the complexity of the motion model based on the available data. Our approach outperforms several parametric models on a helicopter-based car-tracking task on data collected from the greater Boston area

DSpace@MIT

Crossref