Search CORE

20 research outputs found

Actor-Critic Reinforcement Learning for Control with Stability Guarantee

Author: Han Minghao
Pan Wei
Wang Jun
Zhang Lixian
Publication venue
Publication date: 15/07/2020
Field of study

Reinforcement Learning (RL) and its integration with deep learning have achieved impressive performance in various robotic control tasks, ranging from motion planning and navigation to end-to-end visual manipulation. However, stability is not guaranteed in model-free RL by solely using data. From a control-theoretic perspective, stability is the most important property for any control system, since it is closely related to safety, robustness, and reliability of robotic systems. In this paper, we propose an actor-critic RL framework for control which can guarantee closed-loop stability by employing the classic Lyapunov's method in control theory. First of all, a data-based stability theorem is proposed for stochastic nonlinear systems modeled by Markov decision process. Then we show that the stability condition could be exploited as the critic in the actor-critic RL to learn a controller/policy. At last, the effectiveness of our approach is evaluated on several well-known 3-dimensional robot control tasks and a synthetic biology gene network tracking task in three different popular physics simulation platforms. As an empirical evaluation on the advantage of stability, we show that the learned policies can enable the systems to recover to the equilibrium or way-points when interfered by uncertainties such as system parametric variations and external disturbances to a certain extent.Comment: IEEE RA-L + IROS 202

arXiv.org e-Print Archive

UCL Discovery

The University of Manchester - Institutional Repository

Output feedback stochastic MPC with packet losses

Author: Cannon Mark
Goulart Paul
Yan Shuhao
Publication venue
Publication date: 06/05/2020
Field of study

The paper considers constrained linear systems with stochastic additive disturbances and noisy measurements transmitted over a lossy communication channel. We propose a model predictive control (MPC) law that minimizes a discounted cost subject to a discounted expectation constraint. Sensor data is assumed to be lost with known probability, and data losses are accounted for by expressing the predicted control policy as an affine function of future observations, which results in a convex optimal control problem. An online constraint-tightening technique ensures recursive feasibility of the online optimization and satisfaction of the expectation constraint without bounds on the distributions of the noise and disturbance inputs. The cost evaluated along trajectories of the closed loop system is shown to be bounded by the optimal predicted cost. A numerical example is given to illustrate these results

arXiv.org e-Print Archive

Oxford University Research Archive

Stabilization of strictly dissipative discrete time systems with discounted optimal control

Author: Gaitsgory Vladimir
Grüne Lars
Höger Matthias
Kellett Christopher M.
Weller Steven R.
Publication venue
Publication date: 17/08/2017
Field of study

Crossref

EPub Bayreuth

Local turnpike analysis using local dissipativity for discrete time discounted optimal control

Author: Grüne Lars
Krügel Lisa
Publication venue
Publication date: 01/01/2021
Field of study

Recent results in the literature have provided connections between the so-called turnpike property, near optimality of closed-loop solutions, and strict dissipativity. Motivated by applications in economics, optimal control problems with discounted stage cost are of great interest. In contrast to non-discounted optimal control problems, it is more likely that several asymptotically stable optimal equilibria coexist. Due to the discounting and transition cost from a local to the global equilibrium, it may be more favourable staying in a local equilibrium than moving to the global - cheaper - equilibrium. In the literature, strict dissipativity was shown to provide criteria for global asymptotic stability of optimal equilibria and turnpike behavior. In this paper, we propose a local notion of discounted strict dissipativity and a local turnpike property, both depending on the discount factor. Using these concepts, we investigate the local behaviour of (near-)optimal trajectories and develop conditions on the discount factor to ensure convergence to a local asymptotically stable optimal equilibrium

arXiv.org e-Print Archive

EPub Bayreuth

Stochastic output feedback MPC with intermittent observations

Author: Cannon Mark
Goulart Paul
Yan Shuhao
Publication venue
Publication date: 21/09/2020
Field of study

This paper considers constrained linear systems with stochastic additive disturbances and noisy measurements transmitted over a lossy communication channel. We propose a model predictive control (MPC) law that minimises a discounted cost subject to a discounted expectation constraint. Sensor data is assumed to be lost with known probability, and data losses are accounted for by expressing the predicted control policy as an affine function of future observations, which results in a convex optimal control problem. An online constraint-tightening technique ensures recursive feasibility of the online optimisation problem and satisfaction of the expectation constraint without imposing bounds on the distributions of the noise and disturbance inputs. The discounted cost evaluated along trajectories of the closed loop system is shown to be bounded by the initial optimal predicted cost. We also provide conditions under which the averaged undiscounted closed loop cost accumulated over an infinite horizon is bounded. Numerical simulations are described to illustrate these results.Comment: 12 pages. arXiv admin note: substantial text overlap with arXiv:2004.0259

arXiv.org e-Print Archive

Oxford University Research Archive

Stability guarantees for nonlinear discrete-time systems controlled by approximate value iteration

Author: Buşoniu Lucian
Daafouz Jamal
Granzotto Mathieu
Nešić Dragan
Postoyan Romain
Scherrer Bruno
Publication venue: HAL CCSD
Publication date: 11/12/2019
Field of study

Version longue de l'article du même titre et des mêmes auteurs des proceedings de l'IEEE Conference on Decision on Control 2019, Nice, France.International audienceValue iteration is a method to generate optimal control inputs for generic nonlinear systems and cost functions. Its implementation typically leads to approximation errors, which may have a major impact on the closed-loop system performance. We talk in this case of approximate value iteration (AVI). In this paper, we investigate the stability of systems for which the inputs are obtained by AVI. We consider deter-ministic discrete-time nonlinear plants and a class of general, possibly discounted, costs. We model the closed-loop system as a family of systems parameterized by tunable parameters, which are used for the approximation of the value function at different iterations, the discount factor and the iteration step at which we stop running the algorithm. It is shown, under natural stabilizability and detectability properties as well as mild conditions on the approximation errors, that the family of closed-loop systems exhibit local practical stability properties. The analysis is based on the construction of a Lyapunov function given by the sum of the approximate value function and the Lyapunov-like function that characterizes the detectability of the system. By strengthening our conditions, asymptotic and exponential stability properties are guaranteed

INRIA a CCSD electronic archive server

University of Melbourne Institutional Repository

Control problems on infinite horizon subject to time-dependent pure state constraints

Author: Basco Vincenzo
Publication venue
Publication date: 24/10/2023
Field of study

In the last decades, control problems with infinite horizons and discount factors have become increasingly central not only for economics but also for applications in artificial intelligence and machine learning. The strong links between reinforcement learning and control theory have led to major efforts towards the development of algorithms to learn how to solve constrained control problems. In particular, discount plays a role in addressing the challenges that come with models that have unbounded disturbances. Although algorithms have been extensively explored, few results take into account time-dependent state constraints, which are imposed in most real-world control applications. For this purpose, here we investigate feasibility and sufficient conditions for Lipschitz regularity of the value function for a class of discounted infinite horizon optimal control problems subject to time-dependent constraints. We focus on problems with data that allow nonautonomous dynamics, and Lagrangian and state constraints that can be unbounded with possibly nonsmooth boundaries

arXiv.org e-Print Archive