3 research outputs found
Safe Reinforcement Learning via Curriculum Induction
In safety-critical applications, autonomous agents may need to learn in an
environment where mistakes can be very costly. In such settings, the agent
needs to behave safely not only after but also while learning. To achieve this,
existing safe reinforcement learning methods make an agent rely on priors that
let it avoid dangerous situations during exploration with high probability, but
both the probabilistic guarantees and the smoothness assumptions inherent in
the priors are not viable in many scenarios of interest such as autonomous
driving. This paper presents an alternative approach inspired by human
teaching, where an agent learns under the supervision of an automatic
instructor that saves the agent from violating constraints during learning. In
this model, we introduce the monitor that neither needs to know how to do well
at the task the agent is learning nor needs to know how the environment works.
Instead, it has a library of reset controllers that it activates when the agent
starts behaving dangerously, preventing it from doing damage. Crucially, the
choices of which reset controller to apply in which situation affect the speed
of agent learning. Based on observing agents' progress, the teacher itself
learns a policy for choosing the reset controllers, a curriculum, to optimize
the agent's final policy reward. Our experiments use this framework in two
environments to induce curricula for safe and efficient learning
Robust optimal control using computationally efficient deep reinforcement learning techniques
We investigate current challenges in the application of reinforcement learning (RL) to solve
subsurface flow control problem which is the subject of intensive research in the field of reservoir
management. In typical subsurface flow control problems, the system is partially observed
because the data is often only available at well locations. Furthermore, the model parameters
are highly uncertain as a result of the sparsity of available field data. As a result, we begin
by presenting an RL framework to solve the stochastic optimal control for predefined model
uncertainty and partially observable system. The numerical results are presented using two
state-of-the-art model-free RL algorithms, proximal policy optimization (PPO) and advantage
actor-critic (A2C), on two single-phase subsurface flow test cases representing two distinct flow
scenarios. We identify that computational intractability is one of the major limitations for
the proposed RL framework. This is because the model-free RL algorithms are by definition
sample inefficient and require thousands if not millions of samples to learn optimal control
policies. For subsurface control problems, this corresponds to performing a large number of
simulations, which is computationally quite expensive. Our aim is to build a more generalized
framework that can help alleviate this problem of computational complexity for the proposed RL
framework. This is achieved by employing multiple levels of models. Here, the level refers to the
accuracy or fidelity of the discretization of the domain grid of the underlying partial differential
equations. We propose two distinct approaches that can be used in the most generalized manner.
The first approach involves a more explicit modification of the proposed RL framework. In this
approach, a multigrid framework is proposed that essentially takes advantage of the principles of
sequential transfer learning. The second approach implicitly modifies the classical reinforcement
learning framework itself to take advantage of information from lower-level models. This is
achieved by modifying the classical framework of RL algorithms so that they use an approximate
multilevel Monte Carlo estimates as opposed to Monte Carlo estimates of policy and/or value
network objective functions.Engineering and Physical Sciences Research Council (EPSRC) Funding