7,423 research outputs found
Dynamic Programming Approximations for Partially Observable Stochastic Games
Partially observable stochastic games (POSGs) provide a rich mathematical framework for planning under uncertainty by a group of agents. However, this modeling advantage comes with a price, namely a high computational cost. Solving POSGs optimally quickly becomes intractable after a few decision cycles. Our main contribution is to provide bounded approximation techniques, which enable us to scale POSG algorithms by several orders of magnitude. We study both the POSG model and its cooperative counterpart, DEC-POMDP. Experiments on a number of problems confirm the scalability of our approach while still providing useful policies
Zero-Sum Stochastic Games with Partial Information and Average Payoff
We consider discrete time partially observable zero-sum stochastic game with
average payoff criterion. We study the game using an equivalent completely
observable game. We show that the game has a value and also we come up with a
pair of optimal strategies for both the players.Comment: Journal of Optimization Theory and Applications, 201
Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey
Wireless sensor networks (WSNs) consist of autonomous and resource-limited
devices. The devices cooperate to monitor one or more physical phenomena within
an area of interest. WSNs operate as stochastic systems because of randomness
in the monitored environments. For long service time and low maintenance cost,
WSNs require adaptive and robust methods to address data exchange, topology
formulation, resource and power optimization, sensing coverage and object
detection, and security challenges. In these problems, sensor nodes are to make
optimized decisions from a set of accessible strategies to achieve design
goals. This survey reviews numerous applications of the Markov decision process
(MDP) framework, a powerful decision-making tool to develop adaptive algorithms
and protocols for WSNs. Furthermore, various solution methods are discussed and
compared to serve as a guide for using MDPs in WSNs
microPhantom: Playing microRTS under uncertainty and chaos
This competition paper presents microPhantom, a bot playing microRTS and
participating in the 2020 microRTS AI competition. microPhantom is based on our
previous bot POAdaptive which won the partially observable track of the 2018
and 2019 microRTS AI competitions. In this paper, we focus on decision-making
under uncertainty, by tackling the Unit Production Problem with a method based
on a combination of Constraint Programming and decision theory. We show that
using our method to decide which units to train improves significantly the win
rate against the second-best microRTS bot from the partially observable track.
We also show that our method is resilient in chaotic environments, with a very
small loss of efficiency only. To allow replicability and to facilitate further
research, the source code of microPhantom is available, as well as the
Constraint Programming toolkit it uses
Partially Observed Non-linear Risk-sensitive Optimal Stopping Control for Non-linear Discrete-time Systems
In this paper we introduce and solve the partially observed optimal stopping non-linear risk-sensitive stochastic control problem for discrete-time non-linear systems. The presented results are closely related to previous results for finite horizon partially observed risk-sensitive stochastic control problem. An information state approach is used and a new (three-way) separation principle established that leads to a forward dynamic programming equation and a backward dynamic programming inequality equation (both infinite dimensional). A verification theorem is given that establishes the optimal control and optimal stopping time. The risk-neutral optimal stopping stochastic control problem is also discussed
Structure in the Value Function of Two-Player Zero-Sum Games of Incomplete Information
Zero-sum stochastic games provide a rich model for competitive decision
making. However, under general forms of state uncertainty as considered in the
Partially Observable Stochastic Game (POSG), such decision making problems are
still not very well understood. This paper makes a contribution to the theory
of zero-sum POSGs by characterizing structure in their value function. In
particular, we introduce a new formulation of the value function for zs-POSGs
as a function of the "plan-time sufficient statistics" (roughly speaking the
information distribution in the POSG), which has the potential to enable
generalization over such information distributions. We further delineate this
generalization capability by proving a structural result on the shape of value
function: it exhibits concavity and convexity with respect to appropriately
chosen marginals of the statistic space. This result is a key pre-cursor for
developing solution methods that may be able to exploit such structure.
Finally, we show how these results allow us to reduce a zs-POSG to a
"centralized" model with shared observations, thereby transferring results for
the latter, narrower class, to games with individual (private) observations
- …