558 research outputs found

    Covariance steering in zero-sum linear-quadratic two-player differential games

    Full text link
    We formulate a new class of two-person zero-sum differential games, in a stochastic setting, where a specification on a target terminal state distribution is imposed on the players. We address such added specification by introducing incentives to the game that guides the players to steer the join distribution accordingly. In the present paper, we only address linear quadratic games with Gaussian target distribution. The solution is characterized by a coupled Riccati equations system, resembling that in the standard linear quadratic differential games. Indeed, once the incentive function is calculated, our problem reduces to a standard one. Tthe framework developed in this paper extends previous results in covariance control, a fast growing research area. On the numerical side, problems herein are reformulated as convex-concave minimax problems for which efficient and reliable algorithms are available.Comment: 10 page

    Covariance Steering for Discrete-Time Linear-Quadratic Stochastic Dynamic Games

    Full text link
    This paper addresses the problem of steering a discrete-time linear dynamical system from an initial Gaussian distribution to a final distribution in a game-theoretic setting. One of the two players strives to minimize a quadratic payoff, while at the same time tries to meet a given mean and covariance constraint at the final time-step. The other player maximizes the same payoff, but it is assumed to be indifferent to the terminal constraint. At first, the unconstrained version of the game is examined, and the necessary conditions for the existence of a saddle point are obtained. We then show that obtaining a solution for the one-sided constrained dynamic game is not guaranteed, and subsequently the players' best responses are analyzed. Finally, we propose to numerically solve the problem of steering the distribution under adversarial scenarios using the Jacobi iteration method. The problem of guiding a missile during the endgame is chosen to analyze the proposed approach. A numerical simulation corresponding to the case where the terminal distribution is not achieved is also included, and discuss the necessary conditions to meet the terminal constraint

    Games of Pursuit-Evasion with Multiple Agents and Subject to Uncertainties

    Get PDF
    Over the past decade, there have been constant efforts to induct unmanned aerial vehicles (UAVs) into military engagements, disaster management, weather monitoring, and package delivery, among various other applications. With UAVs starting to come out of controlled environments into real-world scenarios, uncertainties that can be either exogenous or endogenous play an important role in the planning and decision-making aspects of deploying UAVs. At the same time, while the demand for UAVs is steadily increasing, major governments are working on their regulations. There is an urgency to design surveillance and security systems that can efficiently regulate the traffic and usage of these UAVs, especially in secured airspaces. With this motivation, the thesis primarily focuses on airspace security, providing solutions for safe planning under uncertainties while addressing aspects concerning target acquisition and collision avoidance. In this thesis, we first present our work on solutions developed for airspace security that employ multiple agents to capture multiple targets in an efficient manner. Since multi-pursuer multi-evader problems are known to be intractable, heuristics based on the geometry of the game are employed to obtain task-allocation algorithms that are computationally efficient. This is achieved by first analyzing pursuit-evasion problems involving two pursuers and one evader. Using the insights obtained from this analysis, a dynamic allocation algorithm for the pursuers, which is independent of the evader's strategy, is proposed. The algorithm is further extended to solve multi-pursuer multi-evader problems for any number of pursuers and evaders, assuming both sets of agents to be heterogeneous in terms of speed capabilities. Next, we consider stochastic disturbances, analyzing pursuit-evasion problems under stochastic flow fields using forward reachability analysis, and covariance steering. The problem of steering a Gaussian in adversarial scenarios is first analyzed under the framework of general constrained games. The resulting covariance steering problem is solved numerically using iterative techniques. The proposed approach is applied to the missile endgame guidance problem. Subsequently, using the theory of covariance steering, an approach to solve pursuit-evasion problems under external stochastic flow fields is discussed. Assuming a linear feedback control strategy, a chance-constrained covariance game is constructed around the nominal solution of the players. The proposed approach is tested on realistic linear and nonlinear flow fields. Numerical simulations suggest that the pursuer can effectively steer the game towards capture. Finally, the uncertainties are assumed to be parametric in nature. To this end, we first formalize optimal control under parametric uncertainties while introducing sensitivity functions and costates based techniques to address robustness under parametric variations. Utilizing the sensitivity functions, we address the problem of safe path planning in environments containing dynamic obstacles with an uncertain motion model. The sensitivity function based-approach is then extended to address game-theoretic formulations that resemble a "fog of war" situation.Ph.D

    Linear Regression Models Applied to Imperfect Information Spacecraft Pursuit-evasion Differential Games

    Get PDF
    Within satellite rendezvous and proximity operations lies pursuit-evasion differential games between two spacecraft. The extent of possible outcomes can be mathematically bounded by differential games where each player employs optimal strategies. A linear regression model is developed from a large data set of optimal control solutions. The model is shown to map pursuer relative starting positions to final capture positions and estimate capture time. The model is 3.8 times faster than the indirect heuristic method for arbitrary pursuer starting positions on an initial relative orbit about the evader. The linear regression model is shown to be well suited for on-board implementation for autonomous mission planning

    Improving Automated Driving through Planning with Human Internal States

    Full text link
    This work examines the hypothesis that partially observable Markov decision process (POMDP) planning with human driver internal states can significantly improve both safety and efficiency in autonomous freeway driving. We evaluate this hypothesis in a simulated scenario where an autonomous car must safely perform three lane changes in rapid succession. Approximate POMDP solutions are obtained through the partially observable Monte Carlo planning with observation widening (POMCPOW) algorithm. This approach outperforms over-confident and conservative MDP baselines and matches or outperforms QMDP. Relative to the MDP baselines, POMCPOW typically cuts the rate of unsafe situations in half or increases the success rate by 50%.Comment: Preprint before submission to IEEE Transactions on Intelligent Transportation Systems. arXiv admin note: text overlap with arXiv:1702.0085

    Optimal Control Methods for Missile Evasion

    Get PDF
    Optimal control theory is applied to the study of missile evasion, particularly in the case of a single pursuing missile versus a single evading aircraft. It is proposed to divide the evasion problem into two phases, where the primary considerations are energy and maneuverability, respectively. Traditional evasion tactics are well documented for use in the maneuverability phase. To represent the first phase dominated by energy management, the optimal control problem may be posed in two ways, as a fixed final time problem with the objective of maximizing the final distance between the evader and pursuer, and as a free final time problem with the objective of maximizing the final time when the missile reaches some capture distance away from the evader.These two optimal control problems are studied under several different scenarios regarding assumptions about the pursuer. First, a suboptimal control strategy, proportional navigation, is used for the pursuer. Second, it is assumed that the pursuer acts optimally, requiring the solution of a two-sided optimal control problem, otherwise known as a differential game. The resulting trajectory is known as a minimax, and it can be shown that it accounts for uncertainty in the pursuer\u27s control strategy. Finally, a pursuer whose motion and state are uncertain is studied in the context of Receding Horizon Control and Real Time Optimal Control. The results highlight how updating the optimal control trajectory reduces the uncertainty in the resulting miss distance

    Many-agent Reinforcement Learning

    Get PDF
    Multi-agent reinforcement learning (RL) solves the problem of how each agent should behave optimally in a stochastic environment in which multiple agents are learning simultaneously. It is an interdisciplinary domain with a long history that lies in the joint area of psychology, control theory, game theory, reinforcement learning, and deep learning. Following the remarkable success of the AlphaGO series in single-agent RL, 2019 was a booming year that witnessed significant advances in multi-agent RL techniques; impressive breakthroughs have been made on developing AIs that outperform humans on many challenging tasks, especially multi-player video games. Nonetheless, one of the key challenges of multi-agent RL techniques is the scalability; it is still non-trivial to design efficient learning algorithms that can solve tasks including far more than two agents (N≫2N \gg 2), which I name by \emph{many-agent reinforcement learning} (MARL\footnote{I use the world of ``MARL" to denote multi-agent reinforcement learning with a particular focus on the cases of many agents; otherwise, it is denoted as ``Multi-Agent RL" by default.}) problems. In this thesis, I contribute to tackling MARL problems from four aspects. Firstly, I offer a self-contained overview of multi-agent RL techniques from a game-theoretical perspective. This overview fills the research gap that most of the existing work either fails to cover the recent advances since 2010 or does not pay adequate attention to game theory, which I believe is the cornerstone to solving many-agent learning problems. Secondly, I develop a tractable policy evaluation algorithm -- αα\alpha^\alpha-Rank -- in many-agent systems. The critical advantage of αα\alpha^\alpha-Rank is that it can compute the solution concept of α\alpha-Rank tractably in multi-player general-sum games with no need to store the entire pay-off matrix. This is in contrast to classic solution concepts such as Nash equilibrium which is known to be PPADPPAD-hard in even two-player cases. αα\alpha^\alpha-Rank allows us, for the first time, to practically conduct large-scale multi-agent evaluations. Thirdly, I introduce a scalable policy learning algorithm -- mean-field MARL -- in many-agent systems. The mean-field MARL method takes advantage of the mean-field approximation from physics, and it is the first provably convergent algorithm that tries to break the curse of dimensionality for MARL tasks. With the proposed algorithm, I report the first result of solving the Ising model and multi-agent battle games through a MARL approach. Fourthly, I investigate the many-agent learning problem in open-ended meta-games (i.e., the game of a game in the policy space). Specifically, I focus on modelling the behavioural diversity in meta-games, and developing algorithms that guarantee to enlarge diversity during training. The proposed metric based on determinantal point processes serves as the first mathematically rigorous definition for diversity. Importantly, the diversity-aware learning algorithms beat the existing state-of-the-art game solvers in terms of exploitability by a large margin. On top of the algorithmic developments, I also contribute two real-world applications of MARL techniques. Specifically, I demonstrate the great potential of applying MARL to study the emergent population dynamics in nature, and model diverse and realistic interactions in autonomous driving. Both applications embody the prospect that MARL techniques could achieve huge impacts in the real physical world, outside of purely video games
    • …
    corecore