23 research outputs found
End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks
Reinforcement Learning (RL) algorithms have found limited success beyond
simulated applications, and one main reason is the absence of safety guarantees
during the learning process. Real world systems would realistically fail or
break before an optimal controller can be learned. To address this issue, we
propose a controller architecture that combines (1) a model-free RL-based
controller with (2) model-based controllers utilizing control barrier functions
(CBFs) and (3) on-line learning of the unknown system dynamics, in order to
ensure safety during learning. Our general framework leverages the success of
RL algorithms to learn high-performance controllers, while the CBF-based
controllers both guarantee safety and guide the learning process by
constraining the set of explorable polices. We utilize Gaussian Processes (GPs)
to model the system dynamics and its uncertainties.
Our novel controller synthesis algorithm, RL-CBF, guarantees safety with high
probability during the learning process, regardless of the RL algorithm used,
and demonstrates greater policy exploration efficiency. We test our algorithm
on (1) control of an inverted pendulum and (2) autonomous car-following with
wireless vehicle-to-vehicle communication, and show that our algorithm attains
much greater sample efficiency in learning than other state-of-the-art
algorithms and maintains safety during the entire learning process.Comment: Published in AAAI 201
Data-driven Economic NMPC using Reinforcement Learning
Reinforcement Learning (RL) is a powerful tool to perform data-driven optimal
control without relying on a model of the system. However, RL struggles to
provide hard guarantees on the behavior of the resulting control scheme. In
contrast, Nonlinear Model Predictive Control (NMPC) and Economic NMPC (ENMPC)
are standard tools for the closed-loop optimal control of complex systems with
constraints and limitations, and benefit from a rich theory to assess their
closed-loop behavior. Unfortunately, the performance of (E)NMPC hinges on the
quality of the model underlying the control scheme. In this paper, we show that
an (E)NMPC scheme can be tuned to deliver the optimal policy of the real system
even when using a wrong model. This result also holds for real systems having
stochastic dynamics. This entails that ENMPC can be used as a new type of
function approximator within RL. Furthermore, we investigate our results in the
context of ENMPC and formally connect them to the concept of dissipativity,
which is central for the ENMPC stability. Finally, we detail how these results
can be used to deploy classic RL tools for tuning (E)NMPC schemes. We apply
these tools on both a classical linear MPC setting and a standard nonlinear
example from the ENMPC literature