Search CORE

3 research outputs found

Safe Reinforcement Learning via Curriculum Induction

Author: Agarwal Alekh
Kolobov Andrey
Krause Andreas
Shah Shital
Turchetta Matteo
Publication venue
Publication date: 01/01/2021
Field of study

In safety-critical applications, autonomous agents may need to learn in an environment where mistakes can be very costly. In such settings, the agent needs to behave safely not only after but also while learning. To achieve this, existing safe reinforcement learning methods make an agent rely on priors that let it avoid dangerous situations during exploration with high probability, but both the probabilistic guarantees and the smoothness assumptions inherent in the priors are not viable in many scenarios of interest such as autonomous driving. This paper presents an alternative approach inspired by human teaching, where an agent learns under the supervision of an automatic instructor that saves the agent from violating constraints during learning. In this model, we introduce the monitor that neither needs to know how to do well at the task the agent is learning nor needs to know how the environment works. Instead, it has a library of reset controllers that it activates when the agent starts behaving dangerously, preventing it from doing damage. Crucially, the choices of which reset controller to apply in which situation affect the speed of agent learning. Based on observing agents' progress, the teacher itself learns a policy for choosing the reset controllers, a curriculum, to optimize the agent's final policy reward. Our experiments use this framework in two environments to induce curricula for safe and efficient learning

arXiv.org e-Print Archive

Repository for Publications and Research Data

Robust optimal control using computationally efficient deep reinforcement learning techniques

Author: Dixit Atish
Publication venue: Energy, Geoscience, Infrastructure and Society
Publication date: 01/01/2023
Field of study

We investigate current challenges in the application of reinforcement learning (RL) to solve subsurface flow control problem which is the subject of intensive research in the field of reservoir management. In typical subsurface flow control problems, the system is partially observed because the data is often only available at well locations. Furthermore, the model parameters are highly uncertain as a result of the sparsity of available field data. As a result, we begin by presenting an RL framework to solve the stochastic optimal control for predefined model uncertainty and partially observable system. The numerical results are presented using two state-of-the-art model-free RL algorithms, proximal policy optimization (PPO) and advantage actor-critic (A2C), on two single-phase subsurface flow test cases representing two distinct flow scenarios. We identify that computational intractability is one of the major limitations for the proposed RL framework. This is because the model-free RL algorithms are by definition sample inefficient and require thousands if not millions of samples to learn optimal control policies. For subsurface control problems, this corresponds to performing a large number of simulations, which is computationally quite expensive. Our aim is to build a more generalized framework that can help alleviate this problem of computational complexity for the proposed RL framework. This is achieved by employing multiple levels of models. Here, the level refers to the accuracy or fidelity of the discretization of the domain grid of the underlying partial differential equations. We propose two distinct approaches that can be used in the most generalized manner. The first approach involves a more explicit modification of the proposed RL framework. In this approach, a multigrid framework is proposed that essentially takes advantage of the principles of sequential transfer learning. The second approach implicitly modifies the classical reinforcement learning framework itself to take advantage of information from lower-level models. This is achieved by modifying the classical framework of RL algorithms so that they use an approximate multilevel Monte Carlo estimates as opposed to Monte Carlo estimates of policy and/or value network objective functions.Engineering and Physical Sciences Research Council (EPSRC) Funding

ROS: The Research Output Service. Heriot-Watt University Edinburgh