20 research outputs found

    Actor-Critic Reinforcement Learning for Control with Stability Guarantee

    Full text link
    Reinforcement Learning (RL) and its integration with deep learning have achieved impressive performance in various robotic control tasks, ranging from motion planning and navigation to end-to-end visual manipulation. However, stability is not guaranteed in model-free RL by solely using data. From a control-theoretic perspective, stability is the most important property for any control system, since it is closely related to safety, robustness, and reliability of robotic systems. In this paper, we propose an actor-critic RL framework for control which can guarantee closed-loop stability by employing the classic Lyapunov's method in control theory. First of all, a data-based stability theorem is proposed for stochastic nonlinear systems modeled by Markov decision process. Then we show that the stability condition could be exploited as the critic in the actor-critic RL to learn a controller/policy. At last, the effectiveness of our approach is evaluated on several well-known 3-dimensional robot control tasks and a synthetic biology gene network tracking task in three different popular physics simulation platforms. As an empirical evaluation on the advantage of stability, we show that the learned policies can enable the systems to recover to the equilibrium or way-points when interfered by uncertainties such as system parametric variations and external disturbances to a certain extent.Comment: IEEE RA-L + IROS 202

    Output feedback stochastic MPC with packet losses

    Full text link
    The paper considers constrained linear systems with stochastic additive disturbances and noisy measurements transmitted over a lossy communication channel. We propose a model predictive control (MPC) law that minimizes a discounted cost subject to a discounted expectation constraint. Sensor data is assumed to be lost with known probability, and data losses are accounted for by expressing the predicted control policy as an affine function of future observations, which results in a convex optimal control problem. An online constraint-tightening technique ensures recursive feasibility of the online optimization and satisfaction of the expectation constraint without bounds on the distributions of the noise and disturbance inputs. The cost evaluated along trajectories of the closed loop system is shown to be bounded by the optimal predicted cost. A numerical example is given to illustrate these results

    Local turnpike analysis using local dissipativity for discrete time discounted optimal control

    Get PDF
    Recent results in the literature have provided connections between the so-called turnpike property, near optimality of closed-loop solutions, and strict dissipativity. Motivated by applications in economics, optimal control problems with discounted stage cost are of great interest. In contrast to non-discounted optimal control problems, it is more likely that several asymptotically stable optimal equilibria coexist. Due to the discounting and transition cost from a local to the global equilibrium, it may be more favourable staying in a local equilibrium than moving to the global - cheaper - equilibrium. In the literature, strict dissipativity was shown to provide criteria for global asymptotic stability of optimal equilibria and turnpike behavior. In this paper, we propose a local notion of discounted strict dissipativity and a local turnpike property, both depending on the discount factor. Using these concepts, we investigate the local behaviour of (near-)optimal trajectories and develop conditions on the discount factor to ensure convergence to a local asymptotically stable optimal equilibrium

    Stochastic output feedback MPC with intermittent observations

    Full text link
    This paper considers constrained linear systems with stochastic additive disturbances and noisy measurements transmitted over a lossy communication channel. We propose a model predictive control (MPC) law that minimises a discounted cost subject to a discounted expectation constraint. Sensor data is assumed to be lost with known probability, and data losses are accounted for by expressing the predicted control policy as an affine function of future observations, which results in a convex optimal control problem. An online constraint-tightening technique ensures recursive feasibility of the online optimisation problem and satisfaction of the expectation constraint without imposing bounds on the distributions of the noise and disturbance inputs. The discounted cost evaluated along trajectories of the closed loop system is shown to be bounded by the initial optimal predicted cost. We also provide conditions under which the averaged undiscounted closed loop cost accumulated over an infinite horizon is bounded. Numerical simulations are described to illustrate these results.Comment: 12 pages. arXiv admin note: substantial text overlap with arXiv:2004.0259

    Stability guarantees for nonlinear discrete-time systems controlled by approximate value iteration

    Get PDF
    Version longue de l'article du même titre et des mêmes auteurs des proceedings de l'IEEE Conference on Decision on Control 2019, Nice, France.International audienceValue iteration is a method to generate optimal control inputs for generic nonlinear systems and cost functions. Its implementation typically leads to approximation errors, which may have a major impact on the closed-loop system performance. We talk in this case of approximate value iteration (AVI). In this paper, we investigate the stability of systems for which the inputs are obtained by AVI. We consider deter-ministic discrete-time nonlinear plants and a class of general, possibly discounted, costs. We model the closed-loop system as a family of systems parameterized by tunable parameters, which are used for the approximation of the value function at different iterations, the discount factor and the iteration step at which we stop running the algorithm. It is shown, under natural stabilizability and detectability properties as well as mild conditions on the approximation errors, that the family of closed-loop systems exhibit local practical stability properties. The analysis is based on the construction of a Lyapunov function given by the sum of the approximate value function and the Lyapunov-like function that characterizes the detectability of the system. By strengthening our conditions, asymptotic and exponential stability properties are guaranteed

    Control problems on infinite horizon subject to time-dependent pure state constraints

    Full text link
    In the last decades, control problems with infinite horizons and discount factors have become increasingly central not only for economics but also for applications in artificial intelligence and machine learning. The strong links between reinforcement learning and control theory have led to major efforts towards the development of algorithms to learn how to solve constrained control problems. In particular, discount plays a role in addressing the challenges that come with models that have unbounded disturbances. Although algorithms have been extensively explored, few results take into account time-dependent state constraints, which are imposed in most real-world control applications. For this purpose, here we investigate feasibility and sufficient conditions for Lipschitz regularity of the value function for a class of discounted infinite horizon optimal control problems subject to time-dependent constraints. We focus on problems with data that allow nonautonomous dynamics, and Lagrangian and state constraints that can be unbounded with possibly nonsmooth boundaries
    corecore