40,616 research outputs found

    Zero-sum stopping games with asymmetric information

    Get PDF
    We study a model of two-player, zero-sum, stopping games with asymmetric information. We assume that the payoff depends on two continuous-time Markov chains (X, Y), where X is only observed by player 1 and Y only by player 2, implying that the players have access to stopping times with respect to different filtrations. We show the existence of a value in mixed stopping times and provide a variational characterization for the value as a function of the initial distribution of the Markov chains. We also prove a verification theorem for optimal stopping rules which allows to construct optimal stopping times. Finally we use our results to solve explicitly two generic examples

    Traditional Wisdom and Monte Carlo Tree Search Face-to-Face in the Card Game Scopone

    Get PDF
    We present the design of a competitive artificial intelligence for Scopone, a popular Italian card game. We compare rule-based players using the most established strategies (one for beginners and two for advanced players) against players using Monte Carlo Tree Search (MCTS) and Information Set Monte Carlo Tree Search (ISMCTS) with different reward functions and simulation strategies. MCTS requires complete information about the game state and thus implements a cheating player while ISMCTS can deal with incomplete information and thus implements a fair player. Our results show that, as expected, the cheating MCTS outperforms all the other strategies; ISMCTS is stronger than all the rule-based players implementing well-known and most advanced strategies and it also turns out to be a challenging opponent for human players.Comment: Preprint. Accepted for publication in the IEEE Transaction on Game

    Synthesis of surveillance strategies via belief abstraction

    Get PDF
    We provide a novel framework for the synthesis of a controller for a robot with a surveillance objective, that is, the robot is required to maintain knowledge of the location of a moving, possibly adversarial target. We formulate this problem as a one-sided partial-information game in which the winning condition for the agent is specified as a temporal logic formula. The specification formalizes the surveillance requirement given by the user by quantifying and reasoning over the agent's beliefs about a target's location. We also incorporate additional non-surveillance tasks. In order to synthesize a surveillance strategy that meets the specification, we transform the partial-information game into a perfect-information one, using abstraction to mitigate the exponential blow-up typically incurred by such transformations. This transformation enables the use of off-the-shelf tools for reactive synthesis. We evaluate the proposed method on two case-studies, demonstrating its applicability to diverse surveillance requirements

    Perseus: Randomized Point-based Value Iteration for POMDPs

    Full text link
    Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agents belief space. We present a randomized point-based value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems

    Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey

    Full text link
    Wireless sensor networks (WSNs) consist of autonomous and resource-limited devices. The devices cooperate to monitor one or more physical phenomena within an area of interest. WSNs operate as stochastic systems because of randomness in the monitored environments. For long service time and low maintenance cost, WSNs require adaptive and robust methods to address data exchange, topology formulation, resource and power optimization, sensing coverage and object detection, and security challenges. In these problems, sensor nodes are to make optimized decisions from a set of accessible strategies to achieve design goals. This survey reviews numerous applications of the Markov decision process (MDP) framework, a powerful decision-making tool to develop adaptive algorithms and protocols for WSNs. Furthermore, various solution methods are discussed and compared to serve as a guide for using MDPs in WSNs

    An Investigation Report on Auction Mechanism Design

    Full text link
    Auctions are markets with strict regulations governing the information available to traders in the market and the possible actions they can take. Since well designed auctions achieve desirable economic outcomes, they have been widely used in solving real-world optimization problems, and in structuring stock or futures exchanges. Auctions also provide a very valuable testing-ground for economic theory, and they play an important role in computer-based control systems. Auction mechanism design aims to manipulate the rules of an auction in order to achieve specific goals. Economists traditionally use mathematical methods, mainly game theory, to analyze auctions and design new auction forms. However, due to the high complexity of auctions, the mathematical models are typically simplified to obtain results, and this makes it difficult to apply results derived from such models to market environments in the real world. As a result, researchers are turning to empirical approaches. This report aims to survey the theoretical and empirical approaches to designing auction mechanisms and trading strategies with more weights on empirical ones, and build the foundation for further research in the field
    • …
    corecore