1,236 research outputs found

    Reinforcement Learning Based on Real-Time Iteration NMPC

    Get PDF
    Reinforcement Learning (RL) has proven a stunning ability to learn optimal policies from data without any prior knowledge on the process. The main drawback of RL is that it is typically very difficult to guarantee stability and safety. On the other hand, Nonlinear Model Predictive Control (NMPC) is an advanced model-based control technique which does guarantee safety and stability, but only yields optimality for the nominal model. Therefore, it has been recently proposed to use NMPC as a function approximator within RL. While the ability of this approach to yield good performance has been demonstrated, the main drawback hindering its applicability is related to the computational burden of NMPC, which has to be solved to full convergence. In practice, however, computationally efficient algorithms such as the Real-Time Iteration (RTI) scheme are deployed in order to return an approximate NMPC solution in very short time. In this paper we bridge this gap by extending the existing theoretical framework to also cover RL based on RTI NMPC. We demonstrate the effectiveness of this new RL approach with a nontrivial example modeling a challenging nonlinear system subject to stochastic perturbations with the objective of optimizing an economic cost.Comment: accepted for the IFAC World Congress 202

    Data-driven Economic NMPC using Reinforcement Learning

    Get PDF
    Reinforcement Learning (RL) is a powerful tool to perform data-driven optimal control without relying on a model of the system. However, RL struggles to provide hard guarantees on the behavior of the resulting control scheme. In contrast, Nonlinear Model Predictive Control (NMPC) and Economic NMPC (ENMPC) are standard tools for the closed-loop optimal control of complex systems with constraints and limitations, and benefit from a rich theory to assess their closed-loop behavior. Unfortunately, the performance of (E)NMPC hinges on the quality of the model underlying the control scheme. In this paper, we show that an (E)NMPC scheme can be tuned to deliver the optimal policy of the real system even when using a wrong model. This result also holds for real systems having stochastic dynamics. This entails that ENMPC can be used as a new type of function approximator within RL. Furthermore, we investigate our results in the context of ENMPC and formally connect them to the concept of dissipativity, which is central for the ENMPC stability. Finally, we detail how these results can be used to deploy classic RL tools for tuning (E)NMPC schemes. We apply these tools on both a classical linear MPC setting and a standard nonlinear example from the ENMPC literature

    Stochastic Model Predictive Control via Fixed Structure Policies

    Get PDF
    In this work, the model predictive control problem is extended to include not only open-loop control sequences but also state-feedback control laws by directly optimizing parameters of a control policy. Additionally, continuous cost functions are developed to allow training of the control policy in making discrete decisions, which is typically done with model-free learning algorithms. This general control policy encompasses a wide class of functions and allows the optimization to occur both online and offline while adding robustness to unmodelled dynamics and outside disturbances. General formulations regarding nonlinear discrete-time dynamics and abstract cost functions are formed for both deterministic and stochastic problems. Analytical solutions are derived for linear cases and compared to existing theory, such as the classical linear quadratic regulator. It is shown that, given some assumptions hold, there exists a finite horizon in which a constant linear state-feedback control law will stabilize a nonlinear system around the origin. Several control policy architectures are used to regulate the cart-pole system in deterministic and stochastic settings, and neural network-based policies are trained to analyze and intercept bodies following stochastic projectile motion

    Robust neurooptimal control for a robot via adaptive dynamic programming

    Get PDF
    We aim at the optimization of the tracking control of a robot to improve the robustness, under the effect of unknown nonlinear perturbations. First, an auxiliary system is introduced, and optimal control of the auxiliary system can be seen as an approximate optimal control of the robot. Then, neural networks (NNs) are employed to approximate the solution of the Hamilton-Jacobi-Isaacs equation under the frame of adaptive dynamic programming. Next, based on the standard gradient attenuation algorithm and adaptive critic design, NNs are trained depending on the designed updating law with relaxing the requirement of initial stabilizing control. In light of the Lyapunov stability theory, all the error signals can be proved to be uniformly ultimately bounded. A series of simulation studies are carried out to show the effectiveness of the proposed control

    Robust Adaptive Critic Based Neurocontrollers for Systems with Input Uncertainties

    Get PDF
    A two-neural network approach to solving optimal control problems is described in this study. This approach called the adaptive critic method consists of two neural networks: one is called the supervisor or critic, and the other is called an action network or controller. The inputs to both these networks are the current states of the system to be controlled. Each network is trained through an output of the other network and the conditions for optimal control. When their outputs are mutually consistent, the controller network output is optimal. The optimality is limited to the underlying model. Hence, we develop a Lyapunov based theory for robust stability of these controllers when there is input uncertainty. We illustrate this approach through a longitudinal autopilot of a nonlinear missile problem

    State-Constrained Agile Missile Control with Adaptive-Critic-Based Neural Networks

    Get PDF
    In this study, we develop an adaptive-critic-based controller to steer an agile missile that has a constraint on the minimum flight Mach number from various initial Mach numbers to a given final Mach number in minimum time while completely reversing its flightpath angle. This class of bounded state space, free final time problems is very difficult to solve due to discontinuities in costates at the constraint boundaries. We use a two-neural-network structure called adaptive critic in this study to carry out the optimization process. This structure obtains an optimal controller through solving optimal control-related equations resulting from a Hamiltonian formulation. Detailed derivations of equations and conditions on the constraint boundary are provided. For numerical experiments, we consider vertical plane scenarios. Flight Mach number and the flightpath angle are the states and the aerodynamic angle of attack is treated as the control. Numerical results bring out some attractive features of the adaptive critic approach and show that this formulation works very well in guiding the missile to its final conditions for this state constrained optimization problem from an envelope of initial conditions

    Enhancing the performance of a safe controller via supervised learning for truck lateral control

    Get PDF
    Correct-by-construction techniques, such as control barrier functions (CBFs), can be used to guarantee closed-loop safety by acting as a supervisor of an existing or legacy controller. However, supervisory-control intervention typically compromises the performance of the closed-loop system. On the other hand, machine learning has been used to synthesize controllers that inherit good properties from a training dataset, though safety is typically not guaranteed due to the difficulty of analyzing the associated neural network. In this paper, supervised learning is combined with CBFs to synthesize controllers that enjoy good performance with provable safety. A training set is generated by trajectory optimization that incorporates the CBF constraint for an interesting range of initial conditions of the truck model. A control policy is obtained via supervised learning that maps a feature representing the initial conditions to a parameterized desired trajectory. The learning-based controller is used as the performance controller and a CBF-based supervisory controller guarantees safety. A case study of lane keeping for articulated trucks shows that the controller trained by supervised learning inherits the good performance of the training set and rarely requires intervention by the CBF supervisorComment: submitted to IEEE Transaction of Control System Technolog
    • …
    corecore