1,818 research outputs found

    Active Search with a Cost for Switching Actions

    Full text link
    Active Sequential Hypothesis Testing (ASHT) is an extension of the classical sequential hypothesis testing problem with controls. Chernoff (Ann. Math. Statist., 1959) proposed a policy called Procedure A and showed its asymptotic optimality as the cost of sampling was driven to zero. In this paper we study a further extension where we introduce costs for switching of actions. We show that a modification of Chernoff's Procedure A, one that we call Sluggish Procedure A, is asymptotically optimal even with switching costs. The growth rate of the total cost, as the probability of false detection is driven to zero, and as a switching parameter of the Sluggish Procedure A is driven down to zero, is the same as that without switching costs.Comment: 8 pages. Presented at 2015 Information Theory and Applications Worksho

    Learning to detect an oddball target with observations from an exponential family

    Full text link
    The problem of detecting an odd arm from a set of K arms of a multi-armed bandit, with fixed confidence, is studied in a sequential decision-making scenario. Each arm's signal follows a distribution from a vector exponential family. All arms have the same parameters except the odd arm. The actual parameters of the odd and non-odd arms are unknown to the decision maker. Further, the decision maker incurs a cost for switching from one arm to another. This is a sequential decision making problem where the decision maker gets only a limited view of the true state of nature at each stage, but can control his view by choosing the arm to observe at each stage. Of interest are policies that satisfy a given constraint on the probability of false detection. An information-theoretic lower bound on the total cost (expected time for a reliable decision plus total switching cost) is first identified, and a variation on a sequential policy based on the generalised likelihood ratio statistic is then studied. Thanks to the vector exponential family assumption, the signal processing in this policy at each stage turns out to be very simple, in that the associated conjugate prior enables easy updates of the posterior distribution of the model parameters. The policy, with a suitable threshold, is shown to satisfy the given constraint on the probability of false detection. Further, the proposed policy is asymptotically optimal in terms of the total cost among all policies that satisfy the constraint on the probability of false detection

    The power-series algorithm applied to cyclic polling systems

    Get PDF
    Polling Systems;Queueing Theory;operations research

    Batched bandit problems

    Get PDF
    Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.Comment: Published at http://dx.doi.org/10.1214/15-AOS1381 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory

    Full text link
    In this paper, we propose a constant word (RAM model) algorithm for regret minimisation for both finite and infinite Stochastic Multi-Armed Bandit (MAB) instances. Most of the existing regret minimisation algorithms need to remember the statistics of all the arms they encounter. This may become a problem for the cases where the number of available words of memory is limited. Designing an efficient regret minimisation algorithm that uses a constant number of words has long been interesting to the community. Some early attempts consider the number of arms to be infinite, and require the reward distribution of the arms to belong to some particular family. Recently, for finitely many-armed bandits an explore-then-commit based algorithm~\citep{Liau+PSY:2018} seems to escape such assumption. However, due to the underlying PAC-based elimination their method incurs a high regret. We present a conceptually simple, and efficient algorithm that needs to remember statistics of at most MM arms, and for any KK-armed finite bandit instance it enjoys a O(KM+K1.5Tlog(T/MK)/M)O(KM +K^{1.5}\sqrt{T\log (T/MK)}/M) upper-bound on regret. We extend it to achieve sub-linear \textit{quantile-regret}~\citep{RoyChaudhuri+K:2018} and empirically verify the efficiency of our algorithm via experiments
    corecore