206 research outputs found

    Q-PrOP: Sample-efficient policy gradient with an off-policy critic

    Get PDF
    Model-free deep reinforcement learning (RL) methods have been successful in a wide variety of simulated domains. However, a major obstacle facing deep RL in the real world is their high sample complexity. Batch policy gradient methods offer stable learning, but at the cost of high variance, which often requires large batches. TD-style methods, such as off-policy actor-critic and Q-learning, are more sample-efficient but biased, and often require costly hyperparameter sweeps to stabilize. In this work, we aim to develop methods that combine the stability of policy gradients with the efficiency of off-policy RL. We present Q-Prop, a policy gradient method that uses a Taylor expansion of the off-policy critic as a control variate. Q-Prop is both sample efficient and stable, and effectively combines the benefits of on-policy and off-policy methods. We analyze the connection between Q-Prop and existing model-free algorithms, and use control variate theory to derive two variants of Q-Prop with conservative and aggressive adaptation. We show that conservative Q-Prop provides substantial gains in sample efficiency over trust region policy optimization (TRPO) with generalized advantage estimation (GAE), and improves stability over deep deterministic policy gradient (DDPG), the state-of-the-art on-policy and off-policy methods, on OpenAI Gym's MuJoCo continuous control environments

    PRACTICAL APPLICATION OF SUSPENSION CRITERIA SCENARIOS: RADIOTHERAPY.

    Get PDF
    In 2007, the European Commission (EC) commissioned a group of experts to undertake the revision of Report RP91 'Criteria for Acceptability of Radiological (including Radiotherapy) and Nuclear Medicine Installations' written in 1997. The revised draft report was submitted to the EC in 2010, which issued it for public consultation. The EC commissioned the same group of experts to consider the comments of the public consultation for further improvement of the revised report. The EC intends to publish the final report under its Radiation Report Series as RP162. This paper presents a selection of practical applications of suspension criteria scenarios in radiotherapy, mostly in brachytherapy, with special emphasis on the critical roles and responsibilities of qualified radiotherapy staff (radiation oncologists, medical physicists and radiotherapy technicians)

    Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning

    Get PDF
    Off-policy model-free deep reinforcement learning methods using previously collected data can improve sample efficiency over on-policy policy gradient techniques. On the other hand, on-policy algorithms are often more stable and easier to use. This paper examines, both theoretically and empirically, approaches to merging on- and off-policy updates for deep reinforcement learning. Theoretical results show that off-policy updates with a value function estimator can be interpolated with on-policy policy gradient updates whilst still satisfying performance bounds. Our analysis uses control variate methods to produce a family of policy gradient algorithms, with several recently proposed algorithms being special cases of this family. We then provide an empirical comparison of these techniques with the remaining algorithmic details fixed, and show how different mixing of off-policy gradient estimates with on-policy samples contribute to improvements in empirical performance. The final algorithm provides a generalization and unification of existing deep policy gradient techniques, has theoretical guarantees on the bias introduced by off-policy updates, and improves on the state-of-the-art model-free deep RL methods on a number of OpenAI Gym continuous control benchmarks

    High fidelity progressive reinforcement learning for agile maneuvering UAVs

    Get PDF
    In this work, we present a high fidelity model based progressive reinforcement learning method for control system design for an agile maneuvering UAV. Our work relies on a simulation-based training and testing environment for doing software-in-the-loop (SIL), hardware-in-the-loop (HIL) and integrated flight testing within photo-realistic virtual reality (VR) environment. Through progressive learning with the high fidelity agent and environment models, the guidance and control policies build agile maneuvering based on fundamental control laws. First, we provide insight on development of high fidelity mathematical models using frequency domain system identification. These models are later used to design reinforcement learning based adaptive flight control laws allowing the vehicle to be controlled over a wide range of operating conditions covering model changes on operating conditions such as payload, voltage and damage to actuators and electronic speed controllers (ESCs). We later design outer flight guidance and control laws. Our current work and progress is summarized in this work

    Groundwater investigations to support irrigated agriculture at La Grange, Western Australia: 2013–18 results

    Get PDF
    The Broome Sandstone aquifer is the main aquifer and groundwater resource in the La Grange area, near Broome in the West Kimberley, Western Australia. Land use is dominated by cattle grazing on pastoral stations, dispersed mining and tourism. Irrigated agriculture has developed at a small scale, with about 470 hectares under cultivation in 2014. Groundwater abstraction is licensed under the La Grange groundwater allocation plan (Department of Water 2010) and managed by the Department of Water and Environmental Regulation. The La Grange groundwater allocation area is split into the La Grange North subarea and La Grange South subarea, with groundwater allocation limits of 35 gigalitres per year (GL/y) and 15GL/y, respectively. The volume of water licensed, committed and requested as of October 2016 was 13.15GL/y. The Department of Agriculture and Food, Western Australia (DAFWA), now part of DPIRD, conducted the four-year La Grange project to help determine the level of irrigated agriculture the aquifer can sustain. This report describes the methods, data analyses and outcomes of a project designed to give a better understanding of the hydrogeological processes of the Broome Sandstone aquifer at La Grange, the interactions between all of its users, and its environmental and cultural assets. As part of the project, DPIRD coordinated development of a bore monitoring network and developed a water balance model to run irrigation scenarios

    A Decision Support System to Predict Acute Fish Toxicity

    Get PDF
    We present a decision support system using a Bayesian network to predict acute fish toxicity from multiple lines of evidence. Fish embryo toxicity testing has been proposed as an alternative to using juvenile or adult fish in acute toxicity testing for hazard assessments of chemicals. The European Chemicals Agency has recommended the development of a so-called weight-of-evidence approach for strengthening the evidence from fish embryo toxicity testing. While weight-of-evidence approaches in the ecotoxicology and ecological risk assessment community in the past have been largely qualitative, we have developed a Bayesian network for using fish embryo toxicity data in a quantitative approach. The system enables users to efficiently predict the potential toxicity of a chemical substance based on multiple types of evidence including physical and chemical properties, quantitative structure-activity relationships, toxicity to algae and daphnids, and fish gill cytotoxicity. The system is demonstrated on three chemical substances of different levels of toxicity. It is considered as a promising step towards a probabilistic weight-of-evidence approach to predict acute fish toxicity from fish embryo toxicity.publishedVersio
    • …
    corecore