72 research outputs found
Data-Driven Integral Reinforcement Learning for Continuous-Time Non-Zero-Sum Games
This paper develops an integral value iteration (VI) method to efficiently find online the Nash equilibrium solution of two-player non-zero-sum (NZS) differential games for linear systems with partially unknown dynamics. To guarantee the closed-loop stability about the Nash equilibrium, the explicit upper bound for the discounted factor is given. To show the efficacy of the presented online model-free solution, the integral VI method is compared with the model-based off-line policy iteration method. Moreover, the theoretical analysis of the integral VI algorithm in terms of three aspects, i.e., positive definiteness properties of the updated cost functions, the stability of the closed-loop systems, and the conditions that guarantee the monotone convergence, is provided in detail. Finally, the simulation results demonstrate the efficacy of the presented algorithms
Inverse linear-quadratic nonzero-sum differential games
This paper addresses the inverse problem for Linear-Quadratic (LQ) nonzero-sum N-player differential games, where the goal is to learn parameters of an unknown cost function for the game, called observed, given the demonstrated trajectories that are known to be generated by stationary linear feedback Nash equilibrium laws. Towards this end, using the demonstrated data, a synthesized game needs to be constructed, which is required to be equivalent to the observed game in the sense that the trajectories generated by the equilibrium feedback laws of the players in the synthesized game are the same as those demonstrated trajectories. We show a model-based algorithm that can accomplish this task using the given trajectories. We then extend this model-based algorithm to a model-free setting to solve the same problem in the case when the system's matrices are unknown. The algorithms combine both inverse optimal control and reinforcement learning methods making extensive use of gradient descent optimization for the latter. The analysis of the algorithm focuses on the proof of its convergence and stability. To further illustrate possible solution characterization, we show how to generate an infinite number of equivalent games, not requiring to run repeatedly the complete algorithm. Simulation results validate the effectiveness of the proposed algorithms
Inverse linear-quadratic nonzero-sum differential games
This paper addresses the inverse problem for Linear-Quadratic (LQ) nonzero-sum N-player differential games, where the goal is to learn parameters of an unknown cost function for the game, called observed, given the demonstrated trajectories that are known to be generated by stationary linear feedback Nash equilibrium laws. Towards this end, using the demonstrated data, a synthesized game needs to be constructed, which is required to be equivalent to the observed game in the sense that the trajectories generated by the equilibrium feedback laws of the players in the synthesized game are the same as those demonstrated trajectories. We show a model-based algorithm that can accomplish this task using the given trajectories. We then extend this model-based algorithm to a model-free setting to solve the same problem in the case when the system's matrices are unknown. The algorithms combine both inverse optimal control and reinforcement learning methods making extensive use of gradient descent optimization for the latter. The analysis of the algorithm focuses on the proof of its convergence and stability. To further illustrate possible solution characterization, we show how to generate an infinite number of equivalent games, not requiring to run repeatedly the complete algorithm. Simulation results validate the effectiveness of the proposed algorithms
Multi-H∞ controls for unknown input-interference nonlinear system with reinforcement learning
This article studies the multi-H∞ controls for the input-interference nonlinear systems via adaptive dynamic programming (ADP) method, which allows for multiple inputs to have the individual selfish component of the strategy to resist weighted interference. In this line, the ADP scheme is used to learn the Nash-optimization solutions of the input-interference nonlinear system such that multiple H∞ performance indices can reach the defined Nash equilibrium. First, the input-interference nonlinear system is given and the Nash equilibrium is defined. An adaptive neural network (NN) observer is introduced to identify the input-interference nonlinear dynamics. Then, the critic NNs are used to learn the multiple H∞ performance indices. A novel adaptive law is designed to update the critic NN weights by minimizing the Hamiltonian-Jacobi-Isaacs (HJI) equation, which can be used to directly calculate the multi-H∞ controls effectively by using input-output data such that the actor structure is avoided. Moreover, the control system stability and updated parameter convergence are proved. Finally, two numerical examples are simulated to verify the proposed ADP scheme for the input-interference nonlinear system
Risk-Minimizing Two-Player Zero-Sum Stochastic Differential Game via Path Integral Control
This paper addresses a continuous-time risk-minimizing two-player zero-sum
stochastic differential game (SDG), in which each player aims to minimize its
probability of failure. Failure occurs in the event when the state of the game
enters into predefined undesirable domains, and one player's failure is the
other's success. We derive a sufficient condition for this game to have a
saddle-point equilibrium and show that it can be solved via a
Hamilton-Jacobi-Isaacs (HJI) partial differential equation (PDE) with Dirichlet
boundary condition. Under certain assumptions on the system dynamics and cost
function, we establish the existence and uniqueness of the saddle-point of the
game. We provide explicit expressions for the saddle-point policies which can
be numerically evaluated using path integral control. This allows us to solve
the game online via Monte Carlo sampling of system trajectories. We implement
our control synthesis framework on two classes of risk-minimizing zero-sum
SDGs: a disturbance attenuation problem and a pursuit-evasion game. Simulation
studies are presented to validate the proposed control synthesis framework.Comment: 8 pages, 4 figures, CDC 202
Neural Network iLQR: A New Reinforcement Learning Architecture
As a notable machine learning paradigm, the research efforts in the context
of reinforcement learning have certainly progressed leaps and bounds. When
compared with reinforcement learning methods with the given system model, the
methodology of the reinforcement learning architecture based on the unknown
model generally exhibits significantly broader universality and applicability.
In this work, a new reinforcement learning architecture is developed and
presented without the requirement of any prior knowledge of the system model,
which is termed as an approach of a "neural network iterative linear quadratic
regulator (NNiLQR)". Depending solely on measurement data, this method yields a
completely new non-parametric routine for the establishment of the optimal
policy (without the necessity of system modeling) through iterative refinements
of the neural network system. Rather importantly, this approach significantly
outperforms the classical iterative linear quadratic regulator (iLQR) method in
terms of the given objective function because of the innovative utilization of
further exploration in the methodology. As clearly indicated from the results
attained in two illustrative examples, these significant merits of the NNiLQR
method are demonstrated rather evidently.Comment: 13 pages, 9 figure
- …