77 research outputs found
Compact Representation of Value Function in Partially Observable Stochastic Games
Value methods for solving stochastic games with partial observability model
the uncertainty about states of the game as a probability distribution over
possible states. The dimension of this belief space is the number of states.
For many practical problems, for example in security, there are exponentially
many possible states which causes an insufficient scalability of algorithms for
real-world problems. To this end, we propose an abstraction technique that
addresses this issue of the curse of dimensionality by projecting
high-dimensional beliefs to characteristic vectors of significantly lower
dimension (e.g., marginal probabilities). Our two main contributions are (1)
novel compact representation of the uncertainty in partially observable
stochastic games and (2) novel algorithm based on this compact representation
that is based on existing state-of-the-art algorithms for solving stochastic
games with partial observability. Experimental evaluation confirms that the new
algorithm over the compact representation dramatically increases the
scalability compared to the state of the art
Security Games with Information Leakage: Modeling and Computation
Most models of Stackelberg security games assume that the attacker only knows
the defender's mixed strategy, but is not able to observe (even partially) the
instantiated pure strategy. Such partial observation of the deployed pure
strategy -- an issue we refer to as information leakage -- is a significant
concern in practical applications. While previous research on patrolling games
has considered the attacker's real-time surveillance, our settings, therefore
models and techniques, are fundamentally different. More specifically, after
describing the information leakage model, we start with an LP formulation to
compute the defender's optimal strategy in the presence of leakage. Perhaps
surprisingly, we show that a key subproblem to solve this LP (more precisely,
the defender oracle) is NP-hard even for the simplest of security game models.
We then approach the problem from three possible directions: efficient
algorithms for restricted cases, approximation algorithms, and heuristic
algorithms for sampling that improves upon the status quo. Our experiments
confirm the necessity of handling information leakage and the advantage of our
algorithms
Optimal interdiction of urban criminals with the aid of real-time information
Most violent crimes happen in urban and suburban cities. With emerging tracking techniques, law enforcement officers can have real-time location information of the escaping criminals and dynamically adjust the security resource allocation to interdict them. Unfortunately, existing work on urban network security games largely ignores such information. This paper addresses this omission. First, we show that ignoring the real-time information can cause an arbitrarily large loss of efficiency. To mitigate this loss, we propose a novel NEtwork purSuiT game (NEST) model that captures the interaction between an escaping adversary and a defender with multiple resources and real-time information available. Second, solving NEST is proven to be NP-hard. Third, after transforming the non-convex program of solving NEST to a linear program, we propose our incremental strategy generation algorithm, including: (i) novel pruning techniques in our best response oracle; and (ii) novel techniques for mapping strategies between subgames and adding multiple best response strategies at one iteration to solve extremely large problems. Finally, extensive experiments show the effectiveness of our approach, which scales up to realistic problem sizes with hundreds of nodes on networks including the real network of Manhattan
Multi-Robot Path Planning for Persistent Monitoring in Stochastic and Adversarial Environments
In this thesis, we study multi-robot path planning problems for persistent monitoring tasks. The goal of such persistent monitoring tasks is to deploy a team of cooperating mobile robots in an environment to continually observe locations of interest in the environment. Robots patrol the environment in order to detect events arriving at the locations of the environment. The events stay at those locations for a certain amount of time before leaving and can only be detected if one of the robots visits the location of an event while the event is there.
In order to detect all possible events arriving at a vertex, the maximum time spent by the robots between visits to that vertex should be less than the duration of the events arriving at that vertex. We consider the problem of finding the minimum number of robots to satisfy these revisit time constraints, also called latency constraints. The decision version of this problem is PSPACE-complete. We provide an O(log p) approximation algorithm for this problem where p is the ratio of the maximum and minimum latency constraints. We also present heuristic algorithms to solve the problem and show through simulations that a proposed orienteering-based heuristic algorithm gives better solutions than the approximation algorithm. We additionally provide an algorithm for the problem of minimizing the maximum weighted latency given a fixed number of robots.
In case the event stay durations are not fixed but are drawn from a known distribution, we consider the problem of maximizing the expected number of detected events. We motivate randomized patrolling paths for such scenarios and use Markov chains to represent those random patrolling paths. We characterize the expected number of detected events as a function of the Markov chains used for patrolling and show that the objective function is submodular for randomly arriving events. We propose an approximation algorithm for the case where the event durations for all the vertices is a constant. We also propose a centralized and an online distributed algorithm to find the random patrolling policies for the robots. We also consider the case where the events are adversarial and can choose where and when to appear in order to maximize their chances of remaining undetected.
The last problem we study in this thesis considers events triggered by a learning adversary. The adversary has a limited time to observe the patrolling policy before it decides when and where events should appear. We study the single robot version of this problem and model this problem as a multi-stage two player game. The adversary observes the patroller’s actions for a finite amount of time to learn the patroller’s strategy and then either chooses a location for the event to appear or reneges based on its confidence in the learned strategy. We characterize the expected payoffs for the players and propose a search algorithm to find a patrolling policy in such scenarios. We illustrate the trade off between hard to learn and hard to attack strategies through simulations
- …