464 research outputs found

    Compact Representation of Value Function in Partially Observable Stochastic Games

    Value methods for solving stochastic games with partial observability model the uncertainty about states of the game as a probability distribution over possible states. The dimension of this belief space is the number of states. For many practical problems, for example in security, there are exponentially many possible states which causes an insufficient scalability of algorithms for real-world problems. To this end, we propose an abstraction technique that addresses this issue of the curse of dimensionality by projecting high-dimensional beliefs to characteristic vectors of significantly lower dimension (e.g., marginal probabilities). Our two main contributions are (1) novel compact representation of the uncertainty in partially observable stochastic games and (2) novel algorithm based on this compact representation that is based on existing state-of-the-art algorithms for solving stochastic games with partial observability. Experimental evaluation confirms that the new algorithm over the compact representation dramatically increases the scalability compared to the state of the art

    Partially Observable Stochastic Games with Neural Perception Mechanisms

    Stochastic games are a well established model for multi-agent sequential decision making under uncertainty. In reality, though, agents have only partial observability of their environment, which makes the problem computationally challenging, even in the single-agent setting of partially observable Markov decision processes. Furthermore, in practice, agents increasingly perceive their environment using data-driven approaches such as neural networks trained on continuous data. To tackle this problem, we propose the model of neuro-symbolic partially-observable stochastic games (NS-POSGs), a variant of continuous-space concurrent stochastic games that explicitly incorporates perception mechanisms. We focus on a one-sided setting, comprising a partially-informed agent with discrete, data-driven observations and a fully-informed agent with continuous observations. We present a new point-based method, called one-sided NS-HSVI, for approximating values of one-sided NS-POSGs and implement it based on the popular particle-based beliefs, showing that it has closed forms for computing values of interest. We provide experimental results to demonstrate the practical applicability of our method for neural networks whose preimage is in polyhedral form.Comment: 41 pages, 5 figure

    Simultaneous Search and Monitoring by Unmanned Aerial Vehicles

    Although robot search and monitoring are two problems which are normally addressed separately, this work conceives the idea that search and monitoring are both required in realistic applications. A problem of simultaneous search and monitoring (SSM) is studied, which innovatively combines two problems in a synergistic perspective. The single pursuer SSM of randomly moving or evasive targets are studied first, and are extended to the cases with multiple pursuers. The precise mathematical frameworks for this work are POMDP, POSG and Dec-POMDP. They are all intractable and non-scalable. Different approaches are taken in each scenario, to reduce computation cost and achieve online and distributed planning, without significantly undermining the performance. For the single pursuer SSM of randomly moving targets, a novel policy reconstruction method is combined with a heuristic branching rule, to generate a heuristic reactive policy. For the single pursuer SSM of evasive targets, an assumption is made and justified, which simplifies the search evasion game to a dynamic guaranteed search problem. For the multiple-pursuer SSM of randomly moving targets, the partial open-loop feedback control method is originally applied to achieve the cooperation implicitly. For the multiple-pursuer SSM of evasive targets, the assumption made in the single pursuer case also simplifies the cooperative search evasion game to a cooperative dynamic guaranteed search problem. In moderate scenarios, the proposed methods show better performance than baseline methods, and can have practical computation efficiency. The extreme scenarios when SSM does not work are also studied

    Hierarchical Multi-Agent Reinforcement Learning for Air Combat Maneuvering

    Full text link
    The application of artificial intelligence to simulate air-to-air combat scenarios is attracting increasing attention. To date the high-dimensional state and action spaces, the high complexity of situation information (such as imperfect and filtered information, stochasticity, incomplete knowledge about mission targets) and the nonlinear flight dynamics pose significant challenges for accurate air combat decision-making. These challenges are exacerbated when multiple heterogeneous agents are involved. We propose a hierarchical multi-agent reinforcement learning framework for air-to-air combat with multiple heterogeneous agents. In our framework, the decision-making process is divided into two stages of abstraction, where heterogeneous low-level policies control the action of individual units, and a high-level commander policy issues macro commands given the overall mission targets. Low-level policies are trained for accurate unit combat control. Their training is organized in a learning curriculum with increasingly complex training scenarios and league-based self-play. The commander policy is trained on mission targets given pre-trained low-level policies. The empirical validation advocates the advantages of our design choices.Comment: 22nd International Conference on Machine Learning and Applications (ICMLA 23

    Optimal pilot decisions and flight trajectories in air combat

    The thesis concerns the analysis and synthesis of pilot decision-making and the design of optimal flight trajectories. In the synthesis framework, the methodology of influence diagrams is applied for modeling and simulating the maneuvering decision process of the pilot in one-on-one air combat. The influence diagram representations describing the maneuvering decision in a one sided optimization setting and in a game setting are constructed. The synthesis of team decision-making in a multiplayer air combat is tackled by formulating a decision theoretical information prioritization approach based on a value function and interval analysis. It gives the team optimal sequence of tactical data that is transmitted between cooperating air units for improving the situation awareness of the friendly pilots in the best possible way. In the optimal trajectory planning framework, an approach towards the interactive automated solution of deterministic aircraft trajectory optimization problems is presented. It offers design principles for a trajectory optimization software that can be operated automatically by a nonexpert user. In addition, the representation of preferences and uncertainties in trajectory optimization is considered by developing a multistage influence diagram that describes a series of the maneuvering decisions in a one-on-one air combat setting. This influence diagram representation as well as the synthesis elaborations provide seminal ways to treat uncertainties in air combat modeling. The work on influence diagrams can also be seen as the extension of the methodology to dynamically evolving decision situations involving possibly multiple actors with conflicting objectives. From the practical point of view, all the synthesis models can be utilized in decision-making systems of air combat simulators. The information prioritization approach can also be implemented in an onboard data link system.reviewe

    Approximation of Bound Functions in Algorithms for Solving Stochastic Games

    V této práci se soustředíme na aproximaci konvexních funkcí v Heuristic Search Value Iteration algoritmu pro řešení Jednostranně Částečně Pozorovatelných Stochastických Her. Jedná se o dynamické hry, kde první hráč má neúplnou informaci o hře, zatímco druhý hráč má informaci úplnou. Konvexní funkce tvoří odhady tzv. value funkce celé hry. Dolní odhad je tvořen pomocí horní obálky lineárních funkcí, zatímco horní odhad je tvořen jako dolní konvexní obálka množiny bodů. V práci se zaměřujeme pouze na aproximaci horního odhadu převážně pomocí Aproximativního Convex Hull algoritmu. Ukazujeme, že aproximace horního odhadu je problematická a že pro lepší výsledky je zapotřebí se zaměřit také na aproximaci dolního odhadu.In this thesis, we focus on the approximation of the bound functions in the Heuristic Search Value Iteration (HSVI) algorithm for One-Sided Partially Observable Stochastic Games (OS-POSG). These are dynamic games with infinite horizon where only one player has imperfect information, and the opponent has full information. The bound functions approximate the value function of the game. The lower bound is represented as an upper envelope of linear functions, while the upper bound is represented as a lower convex envelope of a set of points. We focus only on the approximation of the upper bound mainly by using the Approximate Convex Hull algorithm. We show that the approximation of the upper bound is problematic and that for better results, it is necessary to focus on the approximation of the lower bound function as well


    Differential games are a useful tool both for modeling conflict between autonomous systems and for synthesizing robust control solutions. The traditional study of games has assumed decision agents possess complete information about one another’s strategies and numerical weights. This dissertation relaxes this assumption. Instead, uncertainty in the opponent’s strategy is treated as a symptom of the inevitable gap between modeling assumptions and applications. By combining nonlinear estimation approaches with problem domain knowledge, procedures are developed for acting under uncertainty using established methods that are suitable for applications on embedded systems. The dissertation begins by using nonlinear estimation to account for parametric uncertainty in an opponent’s strategy. A solution is proposed for engagements in which both players use this approach simultaneously. This method is demonstrated on a numerical example of an orbital pursuit-evasion game, and the findings motivate additional developments. First, the solutions of the governing Riccati differential equations are approximated, using automatic differentiation to obtain high-degree Taylor series approximations. Second, constrained estimation is introduced to prevent estimator failures in near-singular engagements. Numerical conditions for nonsingularity are approximated using Chebyshev polynomial basis functions, and applied as constraints to a state estimate. Third and finally, multiple model estimation is suggested as a practical solution for time-critical engagements in which the form of the opponent’s strategy is uncertain. Deceptive opponent strategies are identified as a candidate approach to use against an adaptive player, and a procedure for designing such strategies is proposed. The new developments are demonstrated in a missile interception pursuit-evasion game in which the evader selects from a set of candidate strategies with unknown weights

    Computing Correlated Equilibria in Partially Observable Stochastic Games

    V reálném světě se musíme vypořádávat se situacemi vyžadujícími kooperaci zúčastněných agentů při zachování jejich racionality. Takovéto problémy odpovídají herně teoretickému konceptu korelovaného ekvilibria. Existuje několik prací, které se zabývají výpočtem korelovaného equilibria v stochastických hrách. V současnosti však neexistuje žádný algoritmus, který by byl schopen počítat korelované ekvilibrium pro obecné částečně pozorovatelné stochastické hry. V této práci představujeme první algoritmus pro aproximaci korelovaného ekvilibria v částečně pozorovatelných stochastických hrách, který řeší tyto hry iterativně pomocí postupného zvětšování vygenerované podmnožiny belief stavů. Přestože náš algoritmus nemá žádné garance optimality, ukazujeme, že je schopen nalézt přijatelná řešení.In the real world, we have to deal with situations requiring cooperation of participating agents keeping their rationality. These problems are addressed by the game theoretical concept of correlated equilibrium. There are some works focusing on the problem of computing correlated equilibria in stochastic games. So far there is no algorithm capable of computing correlated equilibria in general partially observable stochastic games. In this work, we propose the first algorithm for approximating correlated equilibria in partially observable stochastic games that iteratively solves these games using gradually expanding generated subset of belifestates. Even though the algorithm has no optimality guarantees, we show that it is capable to compute reasonable solutions

    Convex-Concave Zero-sum Markov Stackelberg Games

    Full text link
    Zero-sum Markov Stackelberg games can be used to model myriad problems, in domains ranging from economics to human robot interaction. In this paper, we develop policy gradient methods that solve these games in continuous state and action settings using noisy gradient estimates computed from observed trajectories of play. When the games are convex-concave, we prove that our algorithms converge to Stackelberg equilibrium in polynomial time. We also show that reach-avoid problems are naturally modeled as convex-concave zero-sum Markov Stackelberg games, and that Stackelberg equilibrium policies are more effective than their Nash counterparts in these problems