464 research outputs found
Compact Representation of Value Function in Partially Observable Stochastic Games
Value methods for solving stochastic games with partial observability model
the uncertainty about states of the game as a probability distribution over
possible states. The dimension of this belief space is the number of states.
For many practical problems, for example in security, there are exponentially
many possible states which causes an insufficient scalability of algorithms for
real-world problems. To this end, we propose an abstraction technique that
addresses this issue of the curse of dimensionality by projecting
high-dimensional beliefs to characteristic vectors of significantly lower
dimension (e.g., marginal probabilities). Our two main contributions are (1)
novel compact representation of the uncertainty in partially observable
stochastic games and (2) novel algorithm based on this compact representation
that is based on existing state-of-the-art algorithms for solving stochastic
games with partial observability. Experimental evaluation confirms that the new
algorithm over the compact representation dramatically increases the
scalability compared to the state of the art
Partially Observable Stochastic Games with Neural Perception Mechanisms
Stochastic games are a well established model for multi-agent sequential
decision making under uncertainty. In reality, though, agents have only partial
observability of their environment, which makes the problem computationally
challenging, even in the single-agent setting of partially observable Markov
decision processes. Furthermore, in practice, agents increasingly perceive
their environment using data-driven approaches such as neural networks trained
on continuous data. To tackle this problem, we propose the model of
neuro-symbolic partially-observable stochastic games (NS-POSGs), a variant of
continuous-space concurrent stochastic games that explicitly incorporates
perception mechanisms. We focus on a one-sided setting, comprising a
partially-informed agent with discrete, data-driven observations and a
fully-informed agent with continuous observations. We present a new point-based
method, called one-sided NS-HSVI, for approximating values of one-sided
NS-POSGs and implement it based on the popular particle-based beliefs, showing
that it has closed forms for computing values of interest. We provide
experimental results to demonstrate the practical applicability of our method
for neural networks whose preimage is in polyhedral form.Comment: 41 pages, 5 figure
Simultaneous Search and Monitoring by Unmanned Aerial Vehicles
Although robot search and monitoring are two problems which are normally addressed separately, this work conceives the idea that search and monitoring are both required in realistic applications. A problem of simultaneous search and monitoring (SSM) is studied, which innovatively combines two problems in a synergistic perspective. The single pursuer SSM of randomly moving or evasive targets are studied first, and are extended to the cases with multiple pursuers. The precise mathematical frameworks for this work are POMDP, POSG and Dec-POMDP. They are all intractable and non-scalable. Different approaches are taken in each scenario, to reduce computation cost and achieve online and distributed planning, without significantly undermining the performance. For the single pursuer SSM of randomly moving targets, a novel policy reconstruction method is combined with a heuristic branching rule, to generate a heuristic reactive policy. For the single pursuer SSM of evasive targets, an assumption is made and justified, which simplifies the search evasion game to a dynamic guaranteed search problem. For the multiple-pursuer SSM of randomly moving targets, the partial open-loop feedback control method is originally applied to achieve the cooperation implicitly. For the multiple-pursuer SSM of evasive targets, the assumption made in the single pursuer case also simplifies the cooperative search evasion game to a cooperative dynamic guaranteed search problem. In moderate scenarios, the proposed methods show better performance than baseline methods, and can have practical computation efficiency. The extreme scenarios when SSM does not work are also studied
Hierarchical Multi-Agent Reinforcement Learning for Air Combat Maneuvering
The application of artificial intelligence to simulate air-to-air combat
scenarios is attracting increasing attention. To date the high-dimensional
state and action spaces, the high complexity of situation information (such as
imperfect and filtered information, stochasticity, incomplete knowledge about
mission targets) and the nonlinear flight dynamics pose significant challenges
for accurate air combat decision-making. These challenges are exacerbated when
multiple heterogeneous agents are involved. We propose a hierarchical
multi-agent reinforcement learning framework for air-to-air combat with
multiple heterogeneous agents. In our framework, the decision-making process is
divided into two stages of abstraction, where heterogeneous low-level policies
control the action of individual units, and a high-level commander policy
issues macro commands given the overall mission targets. Low-level policies are
trained for accurate unit combat control. Their training is organized in a
learning curriculum with increasingly complex training scenarios and
league-based self-play. The commander policy is trained on mission targets
given pre-trained low-level policies. The empirical validation advocates the
advantages of our design choices.Comment: 22nd International Conference on Machine Learning and Applications
(ICMLA 23
Optimal pilot decisions and flight trajectories in air combat
The thesis concerns the analysis and synthesis of pilot decision-making and the design of optimal flight trajectories. In the synthesis framework, the methodology of influence diagrams is applied for modeling and simulating the maneuvering decision process of the pilot in one-on-one air combat. The influence diagram representations describing the maneuvering decision in a one sided optimization setting and in a game setting are constructed. The synthesis of team decision-making in a multiplayer air combat is tackled by formulating a decision theoretical information prioritization approach based on a value function and interval analysis. It gives the team optimal sequence of tactical data that is transmitted between cooperating air units for improving the situation awareness of the friendly pilots in the best possible way. In the optimal trajectory planning framework, an approach towards the interactive automated solution of deterministic aircraft trajectory optimization problems is presented. It offers design principles for a trajectory optimization software that can be operated automatically by a nonexpert user. In addition, the representation of preferences and uncertainties in trajectory optimization is considered by developing a multistage influence diagram that describes a series of the maneuvering decisions in a one-on-one air combat setting. This influence diagram representation as well as the synthesis elaborations provide seminal ways to treat uncertainties in air combat modeling. The work on influence diagrams can also be seen as the extension of the methodology to dynamically evolving decision situations involving possibly multiple actors with conflicting objectives. From the practical point of view, all the synthesis models can be utilized in decision-making systems of air combat simulators. The information prioritization approach can also be implemented in an onboard data link system.reviewe
Approximation of Bound Functions in Algorithms for Solving Stochastic Games
V této práci se soustředíme na aproximaci konvexních funkcí v Heuristic Search Value Iteration algoritmu pro řešení Jednostranně Částečně Pozorovatelných Stochastických Her. Jedná se o dynamické hry, kde první hráč má neúplnou informaci o hře, zatímco druhý hráč má informaci úplnou. Konvexní funkce tvoří odhady tzv. value funkce celé hry. Dolní odhad je tvořen pomocí horní obálky lineárních funkcí, zatímco horní odhad je tvořen jako dolní konvexní obálka množiny bodů. V práci se zaměřujeme pouze na aproximaci horního odhadu převážně pomocí Aproximativního Convex Hull algoritmu. Ukazujeme, že aproximace horního odhadu je problematická a že pro lepší výsledky je zapotřebí se zaměřit také na aproximaci dolního odhadu.In this thesis, we focus on the approximation of the bound functions in the Heuristic Search Value Iteration (HSVI) algorithm for One-Sided Partially Observable Stochastic Games (OS-POSG). These are dynamic games with infinite horizon where only one player has imperfect information, and the opponent has full information. The bound functions approximate the value function of the game. The lower bound is represented as an upper envelope of linear functions, while the upper bound is represented as a lower convex envelope of a set of points. We focus only on the approximation of the upper bound mainly by using the Approximate Convex Hull algorithm. We show that the approximation of the upper bound is problematic and that for better results, it is necessary to focus on the approximation of the lower bound function as well
ESTIMATION-BASED SOLUTIONS TO INCOMPLETE INFORMATION PURSUIT-EVASION GAMES
Differential games are a useful tool both for modeling conflict between autonomous systems and for synthesizing robust control solutions. The traditional study of games has assumed decision agents possess complete information about one another’s strategies and numerical weights. This dissertation relaxes this assumption. Instead, uncertainty in the opponent’s strategy is treated as a symptom of the inevitable gap between modeling assumptions and applications. By combining nonlinear estimation approaches with problem domain knowledge, procedures are developed for acting under uncertainty using established methods that are suitable for applications on embedded systems. The dissertation begins by using nonlinear estimation to account for parametric uncertainty in an opponent’s strategy. A solution is proposed for engagements in which both players use this approach simultaneously. This method is demonstrated on a numerical example of an orbital pursuit-evasion game, and the findings motivate additional developments. First, the solutions of the governing Riccati differential equations are approximated, using automatic differentiation to obtain high-degree Taylor series approximations. Second, constrained estimation is introduced to prevent estimator failures in near-singular engagements. Numerical conditions for nonsingularity are approximated using Chebyshev polynomial basis functions, and applied as constraints to a state estimate. Third and finally, multiple model estimation is suggested as a practical solution for time-critical engagements in which the form of the opponent’s strategy is uncertain. Deceptive opponent strategies are identified as a candidate approach to use against an adaptive player, and a procedure for designing such strategies is proposed. The new developments are demonstrated in a missile interception pursuit-evasion game in which the evader selects from a set of candidate strategies with unknown weights
Computing Correlated Equilibria in Partially Observable Stochastic Games
V reálném světě se musíme vypořádávat se situacemi vyžadujícími kooperaci zúčastněných agentů při zachování jejich racionality. Takovéto problémy odpovídají herně teoretickému konceptu korelovaného ekvilibria. Existuje několik prací, které se zabývají výpočtem korelovaného equilibria v stochastických hrách. V současnosti však neexistuje žádný algoritmus, který by byl schopen počítat korelované ekvilibrium pro obecné částečně pozorovatelné stochastické hry. V této práci představujeme první algoritmus pro aproximaci korelovaného ekvilibria v částečně pozorovatelných stochastických hrách, který řeší tyto hry iterativně pomocí postupného zvětšování vygenerované podmnožiny belief stavů. Přestože náš algoritmus nemá žádné garance optimality, ukazujeme, že je schopen nalézt přijatelná řešení.In the real world, we have to deal with situations requiring cooperation of participating agents keeping their rationality. These problems are addressed by the game theoretical concept of correlated equilibrium. There are some works focusing on the problem of computing correlated equilibria in stochastic games. So far there is no algorithm capable of computing correlated equilibria in general partially observable stochastic games. In this work, we propose the first algorithm for approximating correlated equilibria in partially observable stochastic games that iteratively solves these games using gradually expanding generated subset of belifestates. Even though the algorithm has no optimality guarantees, we show that it is capable to compute reasonable solutions
Convex-Concave Zero-sum Markov Stackelberg Games
Zero-sum Markov Stackelberg games can be used to model myriad problems, in
domains ranging from economics to human robot interaction. In this paper, we
develop policy gradient methods that solve these games in continuous state and
action settings using noisy gradient estimates computed from observed
trajectories of play. When the games are convex-concave, we prove that our
algorithms converge to Stackelberg equilibrium in polynomial time. We also show
that reach-avoid problems are naturally modeled as convex-concave zero-sum
Markov Stackelberg games, and that Stackelberg equilibrium policies are more
effective than their Nash counterparts in these problems
- …