Reinforcement learning for multi-agent and robust control systems

Zhang, Kaiqing

oai:www.ideals.illinois.edu:2142/113153

Reinforcement learning for multi-agent and robust control systems

Authors: Kaiqing Zhang
Publication date: 12 January 2024
Publisher

Abstract

Recent years have witnessed phenomenal accomplishments of reinforcement learning (RL) in many prominent sequential decision-making problems, such as playing the game of Go, playing real-time strategy games, robotic control, and autonomous driving. Motivated by these empirical successes, research toward theoretical understandings of RL algorithms has also re-gained great attention in recent years. In this dissertation, our goal is to contribute to these efforts through new approaches and tools, by developing RL algorithms for multi-agent and robust control systems, which find broad applications in the aforementioned examples and other diverse areas. In this dissertation, we consider specifically two different and fundamental settings that in general fall into the realm of RL for multi-agent and robust control systems: (i) decentralized multi-agent RL (MARL) with networked agents; (ii) H2/H∞-robust control synthesis. We develop new RL algorithms for these settings, supported by theoretical convergence guarantees. In setting i, a team of collaborative MARL agents is connected via a communication network, without the coordination of any central controller. With only neighbor-to-neighbor communications, we introduce decentralized actor-critic algorithms for each agent, and establish their convergence guarantees when linear function approximation is used. Setting ii corresponds to a classical robust control problem, with linear dynamics and robustness concerns in the H∞-norm sense. In contrast to existing solvers, we introduce policy-gradient methods to solve the robust control problem, with global convergence guarantees, despite its nonconvexity. More interestingly, we show that two of these methods enjoy the implicit regularization property: the iterates of the controller automatically preserve a certain level of robustness stability, by following such policy search directions. This robustness-on-the-fly property is crucial for learning in safety-critical robust control systems. We then study the model-free regime, where we develop derivative-free policy gradient methods to solve the finite-horizon version of the problem, with sampled trajectories from the system and sample complexity guarantees. Interestingly, this robust control problem also unifies several other fundamental settings in control theory and game theory, including risk-sensitive linear control, i.e., linear exponential quadratic Gaussian (LEQG) control, and linear quadratic zero-sum dynamic games. The latter can be viewed as a benchmark setting for competitive multi-agent RL. Hence, our results provide policy-search methods for solving these problems unifiedly. Finally, we provide numerical results to demonstrate the computational efficiency of our policy search algorithms, compared to several existing robust control solvers.U of I OnlyAuthor requested U of Illinois access only (OA after 2yrs) in Vireo ETD syste

Similar works

Full text

Illinois Digital Environment for Access to Learning and Scholarship Repository

oai:www.ideals.illinois.edu:21...

Last time updated on 26/02/2022

This paper was published in Illinois Digital Environment for Access to Learning and Scholarship Repository.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.