20 research outputs found
Actor-Critic Reinforcement Learning for Control with Stability Guarantee
Reinforcement Learning (RL) and its integration with deep learning have
achieved impressive performance in various robotic control tasks, ranging from
motion planning and navigation to end-to-end visual manipulation. However,
stability is not guaranteed in model-free RL by solely using data. From a
control-theoretic perspective, stability is the most important property for any
control system, since it is closely related to safety, robustness, and
reliability of robotic systems. In this paper, we propose an actor-critic RL
framework for control which can guarantee closed-loop stability by employing
the classic Lyapunov's method in control theory. First of all, a data-based
stability theorem is proposed for stochastic nonlinear systems modeled by
Markov decision process. Then we show that the stability condition could be
exploited as the critic in the actor-critic RL to learn a controller/policy. At
last, the effectiveness of our approach is evaluated on several well-known
3-dimensional robot control tasks and a synthetic biology gene network tracking
task in three different popular physics simulation platforms. As an empirical
evaluation on the advantage of stability, we show that the learned policies can
enable the systems to recover to the equilibrium or way-points when interfered
by uncertainties such as system parametric variations and external disturbances
to a certain extent.Comment: IEEE RA-L + IROS 202
Output feedback stochastic MPC with packet losses
The paper considers constrained linear systems with stochastic additive
disturbances and noisy measurements transmitted over a lossy communication
channel. We propose a model predictive control (MPC) law that minimizes a
discounted cost subject to a discounted expectation constraint. Sensor data is
assumed to be lost with known probability, and data losses are accounted for by
expressing the predicted control policy as an affine function of future
observations, which results in a convex optimal control problem. An online
constraint-tightening technique ensures recursive feasibility of the online
optimization and satisfaction of the expectation constraint without bounds on
the distributions of the noise and disturbance inputs. The cost evaluated along
trajectories of the closed loop system is shown to be bounded by the optimal
predicted cost. A numerical example is given to illustrate these results
Local turnpike analysis using local dissipativity for discrete time discounted optimal control
Recent results in the literature have provided connections between the
so-called turnpike property, near optimality of closed-loop solutions, and
strict dissipativity. Motivated by applications in economics, optimal control
problems with discounted stage cost are of great interest. In contrast to
non-discounted optimal control problems, it is more likely that several
asymptotically stable optimal equilibria coexist. Due to the discounting and
transition cost from a local to the global equilibrium, it may be more
favourable staying in a local equilibrium than moving to the global - cheaper -
equilibrium. In the literature, strict dissipativity was shown to provide
criteria for global asymptotic stability of optimal equilibria and turnpike
behavior. In this paper, we propose a local notion of discounted strict
dissipativity and a local turnpike property, both depending on the discount
factor. Using these concepts, we investigate the local behaviour of
(near-)optimal trajectories and develop conditions on the discount factor to
ensure convergence to a local asymptotically stable optimal equilibrium
Stochastic output feedback MPC with intermittent observations
This paper considers constrained linear systems with stochastic additive
disturbances and noisy measurements transmitted over a lossy communication
channel. We propose a model predictive control (MPC) law that minimises a
discounted cost subject to a discounted expectation constraint. Sensor data is
assumed to be lost with known probability, and data losses are accounted for by
expressing the predicted control policy as an affine function of future
observations, which results in a convex optimal control problem. An online
constraint-tightening technique ensures recursive feasibility of the online
optimisation problem and satisfaction of the expectation constraint without
imposing bounds on the distributions of the noise and disturbance inputs. The
discounted cost evaluated along trajectories of the closed loop system is shown
to be bounded by the initial optimal predicted cost. We also provide conditions
under which the averaged undiscounted closed loop cost accumulated over an
infinite horizon is bounded. Numerical simulations are described to illustrate
these results.Comment: 12 pages. arXiv admin note: substantial text overlap with
arXiv:2004.0259
Stability guarantees for nonlinear discrete-time systems controlled by approximate value iteration
Version longue de l'article du même titre et des mêmes auteurs des proceedings de l'IEEE Conference on Decision on Control 2019, Nice, France.International audienceValue iteration is a method to generate optimal control inputs for generic nonlinear systems and cost functions. Its implementation typically leads to approximation errors, which may have a major impact on the closed-loop system performance. We talk in this case of approximate value iteration (AVI). In this paper, we investigate the stability of systems for which the inputs are obtained by AVI. We consider deter-ministic discrete-time nonlinear plants and a class of general, possibly discounted, costs. We model the closed-loop system as a family of systems parameterized by tunable parameters, which are used for the approximation of the value function at different iterations, the discount factor and the iteration step at which we stop running the algorithm. It is shown, under natural stabilizability and detectability properties as well as mild conditions on the approximation errors, that the family of closed-loop systems exhibit local practical stability properties. The analysis is based on the construction of a Lyapunov function given by the sum of the approximate value function and the Lyapunov-like function that characterizes the detectability of the system. By strengthening our conditions, asymptotic and exponential stability properties are guaranteed
Control problems on infinite horizon subject to time-dependent pure state constraints
In the last decades, control problems with infinite horizons and discount
factors have become increasingly central not only for economics but also for
applications in artificial intelligence and machine learning. The strong links
between reinforcement learning and control theory have led to major efforts
towards the development of algorithms to learn how to solve constrained control
problems. In particular, discount plays a role in addressing the challenges
that come with models that have unbounded disturbances. Although algorithms
have been extensively explored, few results take into account time-dependent
state constraints, which are imposed in most real-world control applications.
For this purpose, here we investigate feasibility and sufficient conditions for
Lipschitz regularity of the value function for a class of discounted infinite
horizon optimal control problems subject to time-dependent constraints. We
focus on problems with data that allow nonautonomous dynamics, and Lagrangian
and state constraints that can be unbounded with possibly nonsmooth boundaries