Search CORE

68 research outputs found

Recommended from our members

Sample-Efficient Deep Reinforcement Learning for Continuous Control

Author: Gu Shixiang
Publication venue: University of Cambridge
Publication date: 01/07/2019
Field of study

Reinforcement learning (RL) is a powerful, generic approach to discovering optimal policies in complex sequential decision-making problems. Recently, with flexible function approximators such as neural networks, RL has greatly expanded its realm of applications, from playing computer games with pixel inputs, to mastering the game of Go, to learning parkour movements by simulated humanoids. However, the common RL approaches are known to be sample intensive, making them difficult to be applied to real-world problems such as robotics. This thesis makes several contributions toward developing RL algorithms for learning in the wild, where sample-efficiency and stability are critical. The key contributions include Normalized Advantage Functions (NAF), extending Q-learning for continuous action problems; Interpolated Policy Gradient (IPG), unifying prior policy gradient algorithm variants through theoretical analyses on bias and variance; and Temporal Difference Models (TDM), interpreting a parameterized Q-function as a generalized dynamics model for novel temporally abstracted model-based planning. Importantly, this thesis highlights that these algorithms can be seen as bridging gaps between branches of RL – model-based with modelfree, and on-policy with off-policy. The proposed algorithms not only achieve substantial improvements over the prior approaches, but also provide novel perspectives on how to mix different branches of RL effectively to gain the best of both worlds. NAF has subsequently been shown to be able to train two 7-DoF robot arms to open doors using only 2.5 hours of real-world experience, making it one of the first demonstrations of deep RL approaches on real robots.- Cambridge-Tuebingen PhD Fellowship in Machine Learning - Google Focused Research Award - NSER

Apollo (Cambridge)

MPG.PuRe

Recommended from our members

Particle Gibbs for Infinite Hidden Markov Models

Author: Ge Hong
Ghahramani Zoubin
Gu Shixiang
Tripuranen Nilesh
Publication venue: Advances in Neural Information Processing Systems 28
Publication date: 18/12/2015
Field of study

This is the final version of the article. It first appeared from Curran Associates via http://papers.nips.cc/paper/5968-particle-gibbs-for-infinite-hidden-markov-modelsInfinite Hidden Markov Models (iHMM’s) are an attractive, nonparametric generalization of the classical Hidden Markov Model which can automatically infer the number of hidden states in the system. However, due to the infinite-dimensional nature of the transition dynamics, performing inference in the iHMM is difficult. In this paper, we present an infinite-state Particle Gibbs (PG) algorithm to resample state trajectories for the iHMM. The proposed algorithm uses an efficient proposal optimized for iHMMs and leverages ancestor sampling to improve the mixing of the standard PG algorithm. Our algorithm demonstrates significant convergence improvements on synthetic and real world data sets

Apollo (Cambridge)

Muprop: Unbiased backpropagation for stochastic neural networks

Author: Gu Shixiang
Levine Sergey
Mnih Andriy
Sutskever Ilya
Publication venue: 4th International Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings
Publication date: 25/02/2016
Field of study

This is the final version of the article. It first appeared from International Conference on Learning Representations via http://arxiv.org/abs/1511.05176v3Deep neural networks are powerful parametric models that can be trained efficiently using the backpropagation algorithm. Stochastic neural networks combine the power of large parametric functions with that of graphical models, which makes it possible to learn very complex distributions. However, as backpropagation is not directly applicable to stochastic networks that include discrete sampling operations within their computational graph, training such networks remains difficult. We present MuProp, an unbiased gradient estimator for stochastic networks, designed to make this task easier. MuProp improves on the likelihood-ratio estimator by reducing its variance using a control variate based on the first-order Taylor expansion of a mean-field network. Crucially, unlike prior attempts at using backpropagation for training stochastic networks, the resulting estimator is unbiased and well behaved. Our experiments on structured output prediction and discrete latent variable modeling demonstrate that MuProp yields consistently good performance across a range of difficult tasks.ALTA; Jesus College Cambridge; Cambridge-Tubingen PhD Fellowshi

arXiv.org e-Print Archive

Apollo (Cambridge)