Search CORE

26 research outputs found

Finding a most biased coin with fewest flips

Author: Chandrasekaran Karthekeyan
Karp Richard
Publication venue
Publication date: 07/09/2013
Field of study

We study the problem of learning a most biased coin among a set of coins by tossing the coins adaptively. The goal is to minimize the number of tosses until we identify a coin i* whose posterior probability of being most biased is at least 1-delta for a given delta. Under a particular probabilistic model, we give an optimal algorithm, i.e., an algorithm that minimizes the expected number of future tosses. The problem is closely related to finding the best arm in the multi-armed bandit problem using adaptive strategies. Our algorithm employs an optimal adaptive strategy -- a strategy that performs the best possible action at each step after observing the outcomes of all previous coin tosses. Consequently, our algorithm is also optimal for any starting history of outcomes. To our knowledge, this is the first algorithm that employs an optimal adaptive strategy under a Bayesian setting for this problem. Our proof of optimality employs tools from the field of Markov games

arXiv.org e-Print Archive

CiteSeerX

Hierarchical Knowledge-Gradient for Sequential Sampling

Author: Frazier Peter I.
Mes Martijn R.K.
Powel Warren B.
Publication venue: Beta Research School for Operations Management and Logistics, University of Twente
Publication date: 01/01/2009
Field of study

We consider the problem of selecting the best of a finite but very large set of alternatives. Each alternative may be characterized by a multi-dimensional vector and has independent normal rewards. This problem arises in various settings such as (i) ranking and selection, (ii) simulation optimization where the unknown mean of each alternative is estimated with stochastic simulation output, and (iii) approximate dynamic programming where we need to estimate values based on Monte-Carlo simulation. We use a Bayesian probability model for the unknown reward of each alternative and follow a fully sequential sampling policy called the knowledge-gradient policy. This policy myopically optimizes the expected increment in the value of sampling information in each time period. Because the number of alternatives is large, we propose a hierarchical aggregation technique that uses the common features shared by alternatives to learn about many alternatives from even a single measurement, thus greatly reducing the measurement effort required. We demonstrate how this hierarchical knowledge-gradient policy can be applied to efficiently maximize a continuous function and prove that this policy finds a globally optimal alternative in the limit

CiteSeerX

University of Twente Research Information

SINGLE- AND MULTI-OBJECTIVE RANKING AND SELECTION PROCEDURES IN SIMULATION: A HISTORICAL REVIEW

Author
Publication venue: 'Stellenbosch University'
Publication date
Field of study

Crossref

Efficient Simulation Budget Allocation for Ranking the Top m

Author: Hui Xiao
Loo Hay Lee
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

Crossref