Search CORE

1,236 research outputs found

Reinforcement Learning Based on Real-Time Iteration NMPC

Author: Gros Sébastien
Kungurtsev Vyacheslav
Zanon Mario
Publication venue
Publication date: 01/01/2020
Field of study

Reinforcement Learning (RL) has proven a stunning ability to learn optimal policies from data without any prior knowledge on the process. The main drawback of RL is that it is typically very difficult to guarantee stability and safety. On the other hand, Nonlinear Model Predictive Control (NMPC) is an advanced model-based control technique which does guarantee safety and stability, but only yields optimality for the nominal model. Therefore, it has been recently proposed to use NMPC as a function approximator within RL. While the ability of this approach to yield good performance has been demonstrated, the main drawback hindering its applicability is related to the computational burden of NMPC, which has to be solved to full convergence. In practice, however, computationally efficient algorithms such as the Real-Time Iteration (RTI) scheme are deployed in order to return an approximate NMPC solution in very short time. In this paper we bridge this gap by extending the existing theoretical framework to also cover RL based on RTI NMPC. We demonstrate the effectiveness of this new RL approach with a nontrivial example modeling a challenging nonlinear system subject to stochastic perturbations with the objective of optimizing an economic cost.Comment: accepted for the IFAC World Congress 202

arXiv.org e-Print Archive

Archivio della ricerca della Scuola IMT Alti Studi Lucca

Data-driven Economic NMPC using Reinforcement Learning

Author: Gros Sébastien
Zanon Mario
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/04/2019
Field of study

Reinforcement Learning (RL) is a powerful tool to perform data-driven optimal control without relying on a model of the system. However, RL struggles to provide hard guarantees on the behavior of the resulting control scheme. In contrast, Nonlinear Model Predictive Control (NMPC) and Economic NMPC (ENMPC) are standard tools for the closed-loop optimal control of complex systems with constraints and limitations, and benefit from a rich theory to assess their closed-loop behavior. Unfortunately, the performance of (E)NMPC hinges on the quality of the model underlying the control scheme. In this paper, we show that an (E)NMPC scheme can be tuned to deliver the optimal policy of the real system even when using a wrong model. This result also holds for real systems having stochastic dynamics. This entails that ENMPC can be used as a new type of function approximator within RL. Furthermore, we investigate our results in the context of ENMPC and formally connect them to the concept of dissipativity, which is central for the ENMPC stability. Finally, we detail how these results can be used to deploy classic RL tools for tuning (E)NMPC schemes. We apply these tools on both a classical linear MPC setting and a standard nonlinear example from the ENMPC literature

arXiv.org e-Print Archive

Archivio della ricerca della Scuola IMT Alti Studi Lucca

Stochastic Model Predictive Control via Fixed Structure Policies

Author: Wilson Elias
Publication venue: Scholarly Commons
Publication date: 01/04/2022
Field of study

In this work, the model predictive control problem is extended to include not only open-loop control sequences but also state-feedback control laws by directly optimizing parameters of a control policy. Additionally, continuous cost functions are developed to allow training of the control policy in making discrete decisions, which is typically done with model-free learning algorithms. This general control policy encompasses a wide class of functions and allows the optimization to occur both online and offline while adding robustness to unmodelled dynamics and outside disturbances. General formulations regarding nonlinear discrete-time dynamics and abstract cost functions are formed for both deterministic and stochastic problems. Analytical solutions are derived for linear cases and compared to existing theory, such as the classical linear quadratic regulator. It is shown that, given some assumptions hold, there exists a finite horizon in which a constant linear state-feedback control law will stabilize a nonlinear system around the origin. Several control policy architectures are used to regulate the cart-pole system in deterministic and stochastic settings, and neural network-based policies are trained to analyze and intercept bodies following stochastic projectile motion

Embry-Riddle Aeronautical University

Robust neurooptimal control for a robot via adaptive dynamic programming

Author: He Wei
Kong Linghuan
Sun Changyin
Yang Chenguang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2021
Field of study

We aim at the optimization of the tracking control of a robot to improve the robustness, under the effect of unknown nonlinear perturbations. First, an auxiliary system is introduced, and optimal control of the auxiliary system can be seen as an approximate optimal control of the robot. Then, neural networks (NNs) are employed to approximate the solution of the Hamilton-Jacobi-Isaacs equation under the frame of adaptive dynamic programming. Next, based on the standard gradient attenuation algorithm and adaptive critic design, NNs are trained depending on the designed updating law with relaxing the requirement of initial stabilizing control. In light of the Lyapunov stability theory, all the error signals can be proved to be uniformly ultimately bounded. A series of simulation studies are carried out to show the effectiveness of the proposed control

Crossref

UWE Bristol Research Repository

Robust Adaptive Critic Based Neurocontrollers for Systems with Input Uncertainties

Author: Balakrishnan S. N.
Huang Zhongwu
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2000
Field of study

A two-neural network approach to solving optimal control problems is described in this study. This approach called the adaptive critic method consists of two neural networks: one is called the supervisor or critic, and the other is called an action network or controller. The inputs to both these networks are the current states of the system to be controlled. Each network is trained through an output of the other network and the conditions for optimal control. When their outputs are mutually consistent, the controller network output is optimal. The optimality is limited to the underlying model. Hence, we develop a Lyapunov based theory for robust stability of these controllers when there is input uncertainty. We illustrate this approach through a longitudinal autopilot of a nonlinear missile problem

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

State-Constrained Agile Missile Control with Adaptive-Critic-Based Neural Networks

Author: Balakrishnan S. N.
Han Dongchen
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2002
Field of study

In this study, we develop an adaptive-critic-based controller to steer an agile missile that has a constraint on the minimum flight Mach number from various initial Mach numbers to a given final Mach number in minimum time while completely reversing its flightpath angle. This class of bounded state space, free final time problems is very difficult to solve due to discontinuities in costates at the constraint boundaries. We use a two-neural-network structure called adaptive critic in this study to carry out the optimization process. This structure obtains an optimal controller through solving optimal control-related equations resulting from a Hamiltonian formulation. Detailed derivations of equations and conditions on the constraint boundary are provided. For numerical experiments, we consider vertical plane scenarios. Flight Mach number and the flightpath angle are the states and the aerodynamic angle of attack is treated as the control. Numerical results bring out some attractive features of the adaptive critic approach and show that this formulation works very well in guiding the missile to its final conditions for this state constrained optimization problem from an envelope of initial conditions

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Enhancing the performance of a safe controller via supervised learning for truck lateral control

Author: Chen Yuxiao
Grizzle Jessy
Hereid Ayonga
Peng Huei
Publication venue
Publication date: 02/05/2018
Field of study

Correct-by-construction techniques, such as control barrier functions (CBFs), can be used to guarantee closed-loop safety by acting as a supervisor of an existing or legacy controller. However, supervisory-control intervention typically compromises the performance of the closed-loop system. On the other hand, machine learning has been used to synthesize controllers that inherit good properties from a training dataset, though safety is typically not guaranteed due to the difficulty of analyzing the associated neural network. In this paper, supervised learning is combined with CBFs to synthesize controllers that enjoy good performance with provable safety. A training set is generated by trajectory optimization that incorporates the CBF constraint for an interesting range of initial conditions of the truck model. A control policy is obtained via supervised learning that maps a feature representing the initial conditions to a parameterized desired trajectory. The learning-based controller is used as the performance controller and a CBF-based supervisory controller guarantees safety. A case study of lane keeping for articulated trucks shows that the controller trained by supervised learning inherits the good performance of the training set and rarely requires intervention by the CBF supervisorComment: submitted to IEEE Transaction of Control System Technolog

arXiv.org e-Print Archive

Caltech Authors