40,616 research outputs found
Zero-sum stopping games with asymmetric information
We study a model of two-player, zero-sum, stopping games with asymmetric
information. We assume that the payoff depends on two continuous-time Markov
chains (X, Y), where X is only observed by player 1 and Y only by player 2,
implying that the players have access to stopping times with respect to
different filtrations. We show the existence of a value in mixed stopping times
and provide a variational characterization for the value as a function of the
initial distribution of the Markov chains. We also prove a verification theorem
for optimal stopping rules which allows to construct optimal stopping times.
Finally we use our results to solve explicitly two generic examples
Traditional Wisdom and Monte Carlo Tree Search Face-to-Face in the Card Game Scopone
We present the design of a competitive artificial intelligence for Scopone, a
popular Italian card game. We compare rule-based players using the most
established strategies (one for beginners and two for advanced players) against
players using Monte Carlo Tree Search (MCTS) and Information Set Monte Carlo
Tree Search (ISMCTS) with different reward functions and simulation strategies.
MCTS requires complete information about the game state and thus implements a
cheating player while ISMCTS can deal with incomplete information and thus
implements a fair player. Our results show that, as expected, the cheating MCTS
outperforms all the other strategies; ISMCTS is stronger than all the
rule-based players implementing well-known and most advanced strategies and it
also turns out to be a challenging opponent for human players.Comment: Preprint. Accepted for publication in the IEEE Transaction on Game
Synthesis of surveillance strategies via belief abstraction
We provide a novel framework for the synthesis of a controller for a robot with a surveillance objective, that is, the robot is required to maintain knowledge of the location of a moving, possibly adversarial target. We formulate this problem as a one-sided partial-information game in which the winning condition for the agent is specified as a temporal logic formula. The specification formalizes the surveillance requirement given by the user by quantifying and reasoning over the agent's beliefs about a target's location. We also incorporate additional non-surveillance tasks. In order to synthesize a surveillance strategy that meets the specification, we transform the partial-information game into a perfect-information one, using abstraction to mitigate the exponential blow-up typically incurred by such transformations. This transformation enables the use of off-the-shelf tools for reactive synthesis. We evaluate the proposed method on two case-studies, demonstrating its applicability to diverse surveillance requirements
Perseus: Randomized Point-based Value Iteration for POMDPs
Partially observable Markov decision processes (POMDPs) form an attractive
and principled framework for agent planning under uncertainty. Point-based
approximate techniques for POMDPs compute a policy based on a finite set of
points collected in advance from the agents belief space. We present a
randomized point-based value iteration algorithm called Perseus. The algorithm
performs approximate value backup stages, ensuring that in each backup stage
the value of each point in the belief set is improved; the key observation is
that a single backup may improve the value of many belief points. Contrary to
other point-based methods, Perseus backs up only a (randomly selected) subset
of points in the belief set, sufficient for improving the value of each belief
point in the set. We show how the same idea can be extended to dealing with
continuous action spaces. Experimental results show the potential of Perseus in
large scale POMDP problems
Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey
Wireless sensor networks (WSNs) consist of autonomous and resource-limited
devices. The devices cooperate to monitor one or more physical phenomena within
an area of interest. WSNs operate as stochastic systems because of randomness
in the monitored environments. For long service time and low maintenance cost,
WSNs require adaptive and robust methods to address data exchange, topology
formulation, resource and power optimization, sensing coverage and object
detection, and security challenges. In these problems, sensor nodes are to make
optimized decisions from a set of accessible strategies to achieve design
goals. This survey reviews numerous applications of the Markov decision process
(MDP) framework, a powerful decision-making tool to develop adaptive algorithms
and protocols for WSNs. Furthermore, various solution methods are discussed and
compared to serve as a guide for using MDPs in WSNs
An Investigation Report on Auction Mechanism Design
Auctions are markets with strict regulations governing the information
available to traders in the market and the possible actions they can take.
Since well designed auctions achieve desirable economic outcomes, they have
been widely used in solving real-world optimization problems, and in
structuring stock or futures exchanges. Auctions also provide a very valuable
testing-ground for economic theory, and they play an important role in
computer-based control systems.
Auction mechanism design aims to manipulate the rules of an auction in order
to achieve specific goals. Economists traditionally use mathematical methods,
mainly game theory, to analyze auctions and design new auction forms. However,
due to the high complexity of auctions, the mathematical models are typically
simplified to obtain results, and this makes it difficult to apply results
derived from such models to market environments in the real world. As a result,
researchers are turning to empirical approaches.
This report aims to survey the theoretical and empirical approaches to
designing auction mechanisms and trading strategies with more weights on
empirical ones, and build the foundation for further research in the field
- …