Search CORE

77 research outputs found

Compact Representation of Value Function in Partially Observable Stochastic Games

Author: Bošanský Branislav
Horák Karel
Kamhoua Charles
Kiekintveld Christopher
Publication venue
Publication date: 13/03/2019
Field of study

Value methods for solving stochastic games with partial observability model the uncertainty about states of the game as a probability distribution over possible states. The dimension of this belief space is the number of states. For many practical problems, for example in security, there are exponentially many possible states which causes an insufficient scalability of algorithms for real-world problems. To this end, we propose an abstraction technique that addresses this issue of the curse of dimensionality by projecting high-dimensional beliefs to characteristic vectors of significantly lower dimension (e.g., marginal probabilities). Our two main contributions are (1) novel compact representation of the uncertainty in partially observable stochastic games and (2) novel algorithm based on this compact representation that is based on existing state-of-the-art algorithms for solving stochastic games with partial observability. Experimental evaluation confirms that the new algorithm over the compact representation dramatically increases the scalability compared to the state of the art

arXiv.org e-Print Archive

Crossref

Security Games with Information Leakage: Modeling and Computation

Author: Dughmi Shaddin
Jiang Albert X.
Rabinovich Zinovi
Sinha Arunesh
Tambe Milind
Xu Haifeng
Publication venue
Publication date: 04/05/2015
Field of study

Most models of Stackelberg security games assume that the attacker only knows the defender's mixed strategy, but is not able to observe (even partially) the instantiated pure strategy. Such partial observation of the deployed pure strategy -- an issue we refer to as information leakage -- is a significant concern in practical applications. While previous research on patrolling games has considered the attacker's real-time surveillance, our settings, therefore models and techniques, are fundamentally different. More specifically, after describing the information leakage model, we start with an LP formulation to compute the defender's optimal strategy in the presence of leakage. Perhaps surprisingly, we show that a key subproblem to solve this LP (more precisely, the defender oracle) is NP-hard even for the simplest of security game models. We then approach the problem from three possible directions: efficient algorithms for restricted cases, approximation algorithms, and heuristic algorithms for sampling that improves upon the status quo. Our experiments confirm the necessity of handling information leakage and the advantage of our algorithms

arXiv.org e-Print Archive

CiteSeerX

Strategic analysis of complex security scenarios.

Author: Vorobeychik Yevgeniy
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 01/09/2012
Field of study

Crossref

UNT Digital Library

Optimal interdiction of urban criminals with the aid of real-time information

Author: An B
Guo Q
Jennings N
Tran-Thanh L
Zhang Y
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 01/12/2018
Field of study

Most violent crimes happen in urban and suburban cities. With emerging tracking techniques, law enforcement officers can have real-time location information of the escaping criminals and dynamically adjust the security resource allocation to interdict them. Unfortunately, existing work on urban network security games largely ignores such information. This paper addresses this omission. First, we show that ignoring the real-time information can cause an arbitrarily large loss of efficiency. To mitigate this loss, we propose a novel NEtwork purSuiT game (NEST) model that captures the interaction between an escaping adversary and a defender with multiple resources and real-time information available. Second, solving NEST is proven to be NP-hard. Third, after transforming the non-convex program of solving NEST to a linear program, we propose our incremental strategy generation algorithm, including: (i) novel pruning techniques in our best response oracle; and (ii) novel techniques for mapping strategies between subgames and adding multiple best response strategies at one iteration to solve extremely large problems. Finally, extensive experiments show the effectiveness of our approach, which scales up to realistic problem sizes with hundreds of nodes on networks including the real network of Manhattan

Southampton (e-Prints Soton)

Spiral - Imperial College Digital Repository

Association for the Advancement of Artificial Intelligence: AAAI Publications

Multi-Robot Path Planning for Persistent Monitoring in Stochastic and Adversarial Environments

Author: Asghar Ahmad Bilal
Publication venue: 'University of Waterloo'
Publication date: 15/04/2020
Field of study

In this thesis, we study multi-robot path planning problems for persistent monitoring tasks. The goal of such persistent monitoring tasks is to deploy a team of cooperating mobile robots in an environment to continually observe locations of interest in the environment. Robots patrol the environment in order to detect events arriving at the locations of the environment. The events stay at those locations for a certain amount of time before leaving and can only be detected if one of the robots visits the location of an event while the event is there. In order to detect all possible events arriving at a vertex, the maximum time spent by the robots between visits to that vertex should be less than the duration of the events arriving at that vertex. We consider the problem of finding the minimum number of robots to satisfy these revisit time constraints, also called latency constraints. The decision version of this problem is PSPACE-complete. We provide an O(log p) approximation algorithm for this problem where p is the ratio of the maximum and minimum latency constraints. We also present heuristic algorithms to solve the problem and show through simulations that a proposed orienteering-based heuristic algorithm gives better solutions than the approximation algorithm. We additionally provide an algorithm for the problem of minimizing the maximum weighted latency given a fixed number of robots. In case the event stay durations are not fixed but are drawn from a known distribution, we consider the problem of maximizing the expected number of detected events. We motivate randomized patrolling paths for such scenarios and use Markov chains to represent those random patrolling paths. We characterize the expected number of detected events as a function of the Markov chains used for patrolling and show that the objective function is submodular for randomly arriving events. We propose an approximation algorithm for the case where the event durations for all the vertices is a constant. We also propose a centralized and an online distributed algorithm to find the random patrolling policies for the robots. We also consider the case where the events are adversarial and can choose where and when to appear in order to maximize their chances of remaining undetected. The last problem we study in this thesis considers events triggered by a learning adversary. The adversary has a limited time to observe the patrolling policy before it decides when and where events should appear. We study the single robot version of this problem and model this problem as a multi-stage two player game. The adversary observes the patroller’s actions for a finite amount of time to learn the patroller’s strategy and then either chooses a location for the event to appear or reneges based on its confidence in the learned strategy. We characterize the expected payoffs for the players and propose a search algorithm to find a patrolling policy in such scenarios. We illustrate the trade off between hard to learn and hard to attack strategies through simulations

University of Waterloo's Institutional Repository