16 research outputs found

    Learning to detect an oddball target with observations from an exponential family

    Full text link
    The problem of detecting an odd arm from a set of K arms of a multi-armed bandit, with fixed confidence, is studied in a sequential decision-making scenario. Each arm's signal follows a distribution from a vector exponential family. All arms have the same parameters except the odd arm. The actual parameters of the odd and non-odd arms are unknown to the decision maker. Further, the decision maker incurs a cost for switching from one arm to another. This is a sequential decision making problem where the decision maker gets only a limited view of the true state of nature at each stage, but can control his view by choosing the arm to observe at each stage. Of interest are policies that satisfy a given constraint on the probability of false detection. An information-theoretic lower bound on the total cost (expected time for a reliable decision plus total switching cost) is first identified, and a variation on a sequential policy based on the generalised likelihood ratio statistic is then studied. Thanks to the vector exponential family assumption, the signal processing in this policy at each stage turns out to be very simple, in that the associated conjugate prior enables easy updates of the posterior distribution of the model parameters. The policy, with a suitable threshold, is shown to satisfy the given constraint on the probability of false detection. Further, the proposed policy is asymptotically optimal in terms of the total cost among all policies that satisfy the constraint on the probability of false detection

    Sequential Multi-hypothesis Testing in Multi-armed Bandit Problems:An Approach for Asymptotic Optimality

    Full text link
    We consider a multi-hypothesis testing problem involving a K-armed bandit. Each arm's signal follows a distribution from a vector exponential family. The actual parameters of the arms are unknown to the decision maker. The decision maker incurs a delay cost for delay until a decision and a switching cost whenever he switches from one arm to another. His goal is to minimise the overall cost until a decision is reached on the true hypothesis. Of interest are policies that satisfy a given constraint on the probability of false detection. This is a sequential decision making problem where the decision maker gets only a limited view of the true state of nature at each stage, but can control his view by choosing the arm to observe at each stage. An information-theoretic lower bound on the total cost (expected time for a reliable decision plus total switching cost) is first identified, and a variation on a sequential policy based on the generalised likelihood ratio statistic is then studied. Due to the vector exponential family assumption, the signal processing at each stage is simple; the associated conjugate prior distribution on the unknown model parameters enables easy updates of the posterior distribution. The proposed policy, with a suitable threshold for stopping, is shown to satisfy the given constraint on the probability of false detection. Under a continuous selection assumption, the policy is also shown to be asymptotically optimal in terms of the total cost among all policies that satisfy the constraint on the probability of false detection

    Active Anomaly Detection in Heterogeneous Processes

    Full text link
    An active inference problem of detecting anomalies among heterogeneous processes is considered. At each time, a subset of processes can be probed. The objective is to design a sequential probing strategy that dynamically determines which processes to observe at each time and when to terminate the search so that the expected detection time is minimized under a constraint on the probability of misclassifying any process. This problem falls into the general setting of sequential design of experiments pioneered by Chernoff in 1959, in which a randomized strategy, referred to as the Chernoff test, was proposed and shown to be asymptotically optimal as the error probability approaches zero. For the problem considered in this paper, a low-complexity deterministic test is shown to enjoy the same asymptotic optimality while offering significantly better performance in the finite regime and faster convergence to the optimal rate function, especially when the number of processes is large. The computational complexity of the proposed test is also of a significantly lower order.Comment: This work has been accepted for publication on IEEE Transactions on Information Theor

    Quickest Change Detection with Controlled Sensing

    Full text link
    In the problem of quickest change detection, a change occurs at some unknown time in the distribution of a sequence of random vectors that are monitored in real time, and the goal is to detect this change as quickly as possible subject to a certain false alarm constraint. In this work we consider this problem in the presence of parametric uncertainty in the post-change regime and controlled sensing. That is, the post-change distribution contains an unknown parameter, and the distribution of each observation, before and after the change, is affected by a control action. In this context, in addition to a stopping rule that determines the time at which it is declared that the change has occurred, one also needs to determine a sequential control policy, which chooses the control action at each time based on the already collected observations. We formulate this problem mathematically using Lorden's minimax criterion, and assuming that there are finitely many possible actions and post-change parameter values. We then propose a specific procedure for this problem that employs an adaptive CuSum statistic in which (i) the estimate of the parameter is based on a fixed number of the more recent observations, and (ii) each action is selected to maximize the Kullback-Leibler divergence of the next observation based on the current parameter estimate, apart from a small number of exploration times. We show that this procedure, which we call the Windowed Chernoff-CuSum (WCC), is first-order asymptotically optimal under Lorden's minimax criterion, for every possible possible value of the unknown post-change parameter, as the mean time to false alarm goes to infinity. We also provide simulation results to illustrate the performance of the WCC procedure

    Training a Single Bandit Arm

    Full text link
    The stochastic multi-armed bandit problem captures the fundamental exploration vs. exploitation tradeoff inherent in online decision-making in uncertain settings. However, in several applications, the traditional objective of maximizing the expected sum of rewards obtained can be inappropriate. Motivated by the problem of optimizing job assignments to groom novice workers with unknown trainability in labor platforms, we consider a new objective in the classical setup. Instead of maximizing the expected total reward from TT pulls, we consider the vector of cumulative rewards earned from each of the KK arms at the end of TT pulls, and aim to maximize the expected value of the highest cumulativecumulative reward. This corresponds to the objective of grooming a single, highly skilled worker using a limited supply of training jobs. For this new objective, we show that any policy must incur a regret of Ω(K1/3T2/3)\Omega(K^{1/3}T^{2/3}) in the worst case. We design an explore-then-commit policy featuring exploration based on finely tuned confidence bounds on the mean reward and an adaptive stopping criterion, which adapts to the problem difficulty and guarantees a regret of O(K1/3T2/3log⁥K)O(K^{1/3}T^{2/3}\sqrt{\log K}) in the worst case. Our numerical experiments demonstrate that this policy improves upon several natural candidate policies for this setting.Comment: 23 pages, 1 figure, 1 tabl

    Optimal Best Arm Identification with Fixed Confidence

    Get PDF
    International audienceWe give a complete characterization of the complexity of best-arm identification in one-parameter bandit problems. We prove a new, tight lower bound on the sample complexity. We propose the `Track-and-Stop' strategy, which we prove to be asymptotically optimal. It consists in a new sampling rule (which tracks the optimal proportions of arm draws highlighted by the lower bound) and in a stopping rule named after Chernoff, for which we give a new analysis
    corecore