207 research outputs found
Recommended from our members
Continuous deep q-learning with model-based acceleration
This is the version of record. It originally appeared on arXiv at http://arxiv.org/abs/1603.00748.Model-free reinforcement learning has been successfully
applied to a range of challenging problems,
and has recently been extended to handle
large neural network policies and value functions.
However, the sample complexity of modelfree
algorithms, particularly when using highdimensional
function approximators, tends to
limit their applicability to physical systems. In
this paper, we explore algorithms and representations
to reduce the sample complexity of
deep reinforcement learning for continuous control
tasks. We propose two complementary techniques
for improving the efficiency of such algorithms.
First, we derive a continuous variant of
the Q-learning algorithm, which we call normalized
adantage functions (NAF), as an alternative
to the more commonly used policy gradient and
actor-critic methods. NAF representation allows
us to apply Q-learning with experience replay to
continuous tasks, and substantially improves performance
on a set of simulated robotic control
tasks. To further improve the efficiency of our
approach, we explore the use of learned models
for accelerating model-free reinforcement learning.
We show that iteratively refitted local linear
models are especially effective for this, and
demonstrate substantially faster learning on domains
where such models are applicable
Q-PrOP: Sample-efficient policy gradient with an off-policy critic
Model-free deep reinforcement learning (RL) methods have been successful in a
wide variety of simulated domains. However, a major obstacle facing deep RL in
the real world is their high sample complexity. Batch policy gradient methods
offer stable learning, but at the cost of high variance, which often requires
large batches. TD-style methods, such as off-policy actor-critic and
Q-learning, are more sample-efficient but biased, and often require costly
hyperparameter sweeps to stabilize. In this work, we aim to develop methods
that combine the stability of policy gradients with the efficiency of
off-policy RL. We present Q-Prop, a policy gradient method that uses a Taylor
expansion of the off-policy critic as a control variate. Q-Prop is both sample
efficient and stable, and effectively combines the benefits of on-policy and
off-policy methods. We analyze the connection between Q-Prop and existing
model-free algorithms, and use control variate theory to derive two variants of
Q-Prop with conservative and aggressive adaptation. We show that conservative
Q-Prop provides substantial gains in sample efficiency over trust region policy
optimization (TRPO) with generalized advantage estimation (GAE), and improves
stability over deep deterministic policy gradient (DDPG), the state-of-the-art
on-policy and off-policy methods, on OpenAI Gym's MuJoCo continuous control
environments
PRACTICAL APPLICATION OF SUSPENSION CRITERIA SCENARIOS: RADIOTHERAPY.
In 2007, the European Commission (EC) commissioned a group of experts to undertake the revision of Report RP91 'Criteria for Acceptability of Radiological (including Radiotherapy) and Nuclear Medicine Installations' written in 1997. The revised draft report was submitted to the EC in 2010, which issued it for public consultation. The EC commissioned the same group of experts to consider the comments of the public consultation for further improvement of the revised report. The EC intends to publish the final report under its Radiation Report Series as RP162. This paper presents a selection of practical applications of suspension criteria scenarios in radiotherapy, mostly in brachytherapy, with special emphasis on the critical roles and responsibilities of qualified radiotherapy staff (radiation oncologists, medical physicists and radiotherapy technicians)
Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning
Off-policy model-free deep reinforcement learning methods using previously
collected data can improve sample efficiency over on-policy policy gradient
techniques. On the other hand, on-policy algorithms are often more stable and
easier to use. This paper examines, both theoretically and empirically,
approaches to merging on- and off-policy updates for deep reinforcement
learning. Theoretical results show that off-policy updates with a value
function estimator can be interpolated with on-policy policy gradient updates
whilst still satisfying performance bounds. Our analysis uses control variate
methods to produce a family of policy gradient algorithms, with several
recently proposed algorithms being special cases of this family. We then
provide an empirical comparison of these techniques with the remaining
algorithmic details fixed, and show how different mixing of off-policy gradient
estimates with on-policy samples contribute to improvements in empirical
performance. The final algorithm provides a generalization and unification of
existing deep policy gradient techniques, has theoretical guarantees on the
bias introduced by off-policy updates, and improves on the state-of-the-art
model-free deep RL methods on a number of OpenAI Gym continuous control
benchmarks
High fidelity progressive reinforcement learning for agile maneuvering UAVs
In this work, we present a high fidelity model based progressive reinforcement learning method for control system design for an agile maneuvering UAV. Our work relies on a simulation-based training and testing environment for doing software-in-the-loop (SIL), hardware-in-the-loop (HIL) and integrated flight testing within photo-realistic virtual reality (VR) environment. Through progressive learning with the high fidelity agent and environment models, the guidance and control policies build agile maneuvering based on fundamental control laws. First, we provide insight on development of high fidelity mathematical models using frequency domain system identification. These models are later used to design reinforcement learning based adaptive flight control laws allowing the vehicle to be controlled over a wide range of operating conditions covering model changes on operating conditions such as payload, voltage and damage to actuators and electronic speed controllers (ESCs). We later design outer flight guidance and control laws. Our current work and progress is summarized in this work
Groundwater investigations to support irrigated agriculture at La Grange, Western Australia: 2013–18 results
The Broome Sandstone aquifer is the main aquifer and groundwater resource in the La Grange area, near Broome in the West Kimberley, Western Australia. Land use is dominated by cattle grazing on pastoral stations, dispersed mining and tourism. Irrigated agriculture has developed at a small scale, with about 470 hectares under cultivation in 2014. Groundwater abstraction is licensed under the La Grange groundwater allocation plan (Department of Water 2010) and managed by the Department of Water and Environmental Regulation. The La Grange groundwater allocation area is split into the La Grange North subarea and La Grange South subarea, with groundwater allocation limits of 35 gigalitres per year (GL/y) and 15GL/y, respectively. The volume of water licensed, committed and requested as of October 2016 was 13.15GL/y.
The Department of Agriculture and Food, Western Australia (DAFWA), now part of DPIRD, conducted the four-year La Grange project to help determine the level of irrigated agriculture the aquifer can sustain. This report describes the methods, data analyses and outcomes of a project designed to give a better understanding of the hydrogeological processes of the Broome Sandstone aquifer at La Grange, the interactions between all of its users, and its environmental and cultural assets. As part of the project, DPIRD coordinated development of a bore monitoring network and developed a water balance model to run irrigation scenarios
A Decision Support System to Predict Acute Fish Toxicity
We present a decision support system using a Bayesian network to predict acute fish toxicity from multiple lines of evidence. Fish embryo toxicity testing has been proposed as an alternative to using juvenile or adult fish in acute toxicity testing for hazard assessments of chemicals. The European Chemicals Agency has recommended the development of a so-called weight-of-evidence approach for strengthening the evidence from fish embryo toxicity testing. While weight-of-evidence approaches in the ecotoxicology and ecological risk assessment community in the past have been largely qualitative, we have developed a Bayesian network for using fish embryo toxicity data in a quantitative approach. The system enables users to efficiently predict the potential toxicity of a chemical substance based on multiple types of evidence including physical and chemical properties, quantitative structure-activity relationships, toxicity to algae and daphnids, and fish gill cytotoxicity. The system is demonstrated on three chemical substances of different levels of toxicity. It is considered as a promising step towards a probabilistic weight-of-evidence approach to predict acute fish toxicity from fish embryo toxicity.publishedVersio
- …