Search CORE

207 research outputs found

Recommended from our members

Continuous deep q-learning with model-based acceleration

Author: Gu S
Levine S
Lillicrap T
Sutskever U
Publication venue: 33rd International Conference on Machine Learning, ICML 2016
Publication date: 02/03/2016
Field of study

This is the version of record. It originally appeared on arXiv at http://arxiv.org/abs/1603.00748.Model-free reinforcement learning has been successfully applied to a range of challenging problems, and has recently been extended to handle large neural network policies and value functions. However, the sample complexity of modelfree algorithms, particularly when using highdimensional function approximators, tends to limit their applicability to physical systems. In this paper, we explore algorithms and representations to reduce the sample complexity of deep reinforcement learning for continuous control tasks. We propose two complementary techniques for improving the efficiency of such algorithms. First, we derive a continuous variant of the Q-learning algorithm, which we call normalized adantage functions (NAF), as an alternative to the more commonly used policy gradient and actor-critic methods. NAF representation allows us to apply Q-learning with experience replay to continuous tasks, and substantially improves performance on a set of simulated robotic control tasks. To further improve the efficiency of our approach, we explore the use of learned models for accelerating model-free reinforcement learning. We show that iteratively refitted local linear models are especially effective for this, and demonstrate substantially faster learning on domains where such models are applicable

Apollo (Cambridge)

Q-PrOP: Sample-efficient policy gradient with an off-policy critic

Author: Ghahramani Z
Gu S
Levine S
Lillicrap T
Turner RE
Publication venue: 5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings
Publication date: 01/01/2017
Field of study

Model-free deep reinforcement learning (RL) methods have been successful in a wide variety of simulated domains. However, a major obstacle facing deep RL in the real world is their high sample complexity. Batch policy gradient methods offer stable learning, but at the cost of high variance, which often requires large batches. TD-style methods, such as off-policy actor-critic and Q-learning, are more sample-efficient but biased, and often require costly hyperparameter sweeps to stabilize. In this work, we aim to develop methods that combine the stability of policy gradients with the efficiency of off-policy RL. We present Q-Prop, a policy gradient method that uses a Taylor expansion of the off-policy critic as a control variate. Q-Prop is both sample efficient and stable, and effectively combines the benefits of on-policy and off-policy methods. We analyze the connection between Q-Prop and existing model-free algorithms, and use control variate theory to derive two variants of Q-Prop with conservative and aggressive adaptation. We show that conservative Q-Prop provides substantial gains in sample efficiency over trust region policy optimization (TRPO) with generalized advantage estimation (GAE), and improves stability over deep deterministic policy gradient (DDPG), the state-of-the-art on-policy and off-policy methods, on OpenAI Gym's MuJoCo continuous control environments

arXiv.org e-Print Archive

Apollo (Cambridge)

MPG.PuRe

PRACTICAL APPLICATION OF SUSPENSION CRITERIA SCENARIOS: RADIOTHERAPY.

Author: Horton P
Lamm Inger-Lena
Lehmann W
Lillicrap S
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2013
Field of study

In 2007, the European Commission (EC) commissioned a group of experts to undertake the revision of Report RP91 'Criteria for Acceptability of Radiological (including Radiotherapy) and Nuclear Medicine Installations' written in 1997. The revised draft report was submitted to the EC in 2010, which issued it for public consultation. The EC commissioned the same group of experts to consider the comments of the public consultation for further improvement of the revised report. The EC intends to publish the final report under its Radiation Report Series as RP162. This paper presents a selection of practical applications of suspension criteria scenarios in radiotherapy, mostly in brachytherapy, with special emphasis on the critical roles and responsibilities of qualified radiotherapy staff (radiation oncologists, medical physicists and radiotherapy technicians)

Lund University Publications

Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning

Author: Ghahramani Z
Gu S
Levine S
Lillicrap T
Schölkopf B
Turner RE
Publication venue: Advances in Neural Information Processing Systems
Publication date: 01/01/2017
Field of study

Off-policy model-free deep reinforcement learning methods using previously collected data can improve sample efficiency over on-policy policy gradient techniques. On the other hand, on-policy algorithms are often more stable and easier to use. This paper examines, both theoretically and empirically, approaches to merging on- and off-policy updates for deep reinforcement learning. Theoretical results show that off-policy updates with a value function estimator can be interpolated with on-policy policy gradient updates whilst still satisfying performance bounds. Our analysis uses control variate methods to produce a family of policy gradient algorithms, with several recently proposed algorithms being special cases of this family. We then provide an empirical comparison of these techniques with the remaining algorithmic details fixed, and show how different mixing of off-policy gradient estimates with on-policy samples contribute to improvements in empirical performance. The final algorithm provides a generalization and unification of existing deep policy gradient techniques, has theoretical guarantees on the bias introduced by off-policy updates, and improves on the state-of-the-art model-free deep RL methods on a number of OpenAI Gym continuous control benchmarks

arXiv.org e-Print Archive

Apollo (Cambridge)

MPG.PuRe

Somatic mosaicism and female-to-female transmission in a kindred with hemophilia B (factor IX deficiency).

Author: D. P. Lillicrap
K. V. Deugau
S. A. Taylor
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date
Field of study

Crossref

High fidelity progressive reinforcement learning for agile maneuvering UAVs

Author: Abbeel P.
Faust A.
Kim H. J.
Lillicrap T. P.
Uzun S.
Yuksek B.
Zhang T.
Publication venue: 'American Institute of Aeronautics and Astronautics (AIAA)'
Publication date: 05/01/2020
Field of study

In this work, we present a high fidelity model based progressive reinforcement learning method for control system design for an agile maneuvering UAV. Our work relies on a simulation-based training and testing environment for doing software-in-the-loop (SIL), hardware-in-the-loop (HIL) and integrated flight testing within photo-realistic virtual reality (VR) environment. Through progressive learning with the high fidelity agent and environment models, the guidance and control policies build agile maneuvering based on fundamental control laws. First, we provide insight on development of high fidelity mathematical models using frequency domain system identification. These models are later used to design reinforcement learning based adaptive flight control laws allowing the vehicle to be controlled over a wide range of operating conditions covering model changes on operating conditions such as payload, voltage and damage to actuators and electronic speed controllers (ESCs). We later design outer flight guidance and control laws. Our current work and progress is summarized in this work

Crossref

CERES Research Repository (Cranfield Univ.)

Groundwater investigations to support irrigated agriculture at La Grange, Western Australia: 2013–18 results

Author: Gardiner Peter S
George Richard J, Dr
Lillicrap Adam M
Paul Robert J
Raper Gregory Paul
Wright Nicholas J
Publication venue: 'Pharma Research Library'
Publication date: 01/12/2019
Field of study

The Broome Sandstone aquifer is the main aquifer and groundwater resource in the La Grange area, near Broome in the West Kimberley, Western Australia. Land use is dominated by cattle grazing on pastoral stations, dispersed mining and tourism. Irrigated agriculture has developed at a small scale, with about 470 hectares under cultivation in 2014. Groundwater abstraction is licensed under the La Grange groundwater allocation plan (Department of Water 2010) and managed by the Department of Water and Environmental Regulation. The La Grange groundwater allocation area is split into the La Grange North subarea and La Grange South subarea, with groundwater allocation limits of 35 gigalitres per year (GL/y) and 15GL/y, respectively. The volume of water licensed, committed and requested as of October 2016 was 13.15GL/y. The Department of Agriculture and Food, Western Australia (DAFWA), now part of DPIRD, conducted the four-year La Grange project to help determine the level of irrigated agriculture the aquifer can sustain. This report describes the methods, data analyses and outcomes of a project designed to give a better understanding of the hydrogeological processes of the Broome Sandstone aquifer at La Grange, the interactions between all of its users, and its environmental and cultural assets. As part of the project, DPIRD coordinated development of a bore monitoring network and developed a water balance model to run irrigation scenarios

Department of Agriculture and Food, Western Australia (DAFWA): Research Library

A Decision Support System to Predict Acute Fish Toxicity

Author: Braunbeck Thomas
Connors Kristin A.
Embry Michelle
Lillicrap Adam David
Madsen Anders L.
Moe S. Jannicke
Schirmer Kristin
Scholz Stefan
Wolf Raoul
Publication venue
Publication date: 01/01/2022
Field of study

We present a decision support system using a Bayesian network to predict acute fish toxicity from multiple lines of evidence. Fish embryo toxicity testing has been proposed as an alternative to using juvenile or adult fish in acute toxicity testing for hazard assessments of chemicals. The European Chemicals Agency has recommended the development of a so-called weight-of-evidence approach for strengthening the evidence from fish embryo toxicity testing. While weight-of-evidence approaches in the ecotoxicology and ecological risk assessment community in the past have been largely qualitative, we have developed a Bayesian network for using fish embryo toxicity data in a quantitative approach. The system enables users to efficiently predict the potential toxicity of a chemical substance based on multiple types of evidence including physical and chemical properties, quantitative structure-activity relationships, toxicity to algae and daphnids, and fish gill cytotoxicity. The system is demonstrated on three chemical substances of different levels of toxicity. It is considered as a promising step towards a probabilistic weight-of-evidence approach to predict acute fish toxicity from fish embryo toxicity.publishedVersio

Norwegian Geotechnical Institute (NGI) Digital Archive