A multi-armed bandit approach to following a Markov Chain

Abstract

Across defense, homeland security, and law enforcement communities, leaders face the tension between making quick but also well informed decisions regarding time-dependent entities of interest. For example, consider a law enforcement organization (searcher) with a sizable list of potential terrorists (targets) but far fewer observational assets (sensors). The searcher's goal being to follow the target, but resource constraints make continuous coverage impossible, resulting in intermittent observational attempts.We model target behaviour as a discrete time Markov chain with the state space being the target's set of possible locations, activities, or attributes. In this setting, we define following the target as the searcher, at any given time step, correctly identifying and then allocating the sensor to the state which has the highest probability of containing the target. In other words, in each time period the searcher's objective is to decide where to send the sensor, attempting to observe the target in that time period, resulting in a hit or miss from which the searcher learns the target's true transition behaviour. We develop a Multi-Armed Bandit approach for efficiently following the target, where each state takes the place of an arm. Our search policy is five to ten times better than existing approaches.http://archive.org/details/amultiarmedbandi1094555572Outstanding ThesisCaptain, United States Marine CorpsApproved for public release; distribution is unlimited

    Similar works