We study the problem of learning a most biased coin among a set of coins by
tossing the coins adaptively. The goal is to minimize the number of tosses
until we identify a coin i* whose posterior probability of being most biased is
at least 1-delta for a given delta. Under a particular probabilistic model, we
give an optimal algorithm, i.e., an algorithm that minimizes the expected
number of future tosses. The problem is closely related to finding the best arm
in the multi-armed bandit problem using adaptive strategies. Our algorithm
employs an optimal adaptive strategy -- a strategy that performs the best
possible action at each step after observing the outcomes of all previous coin
tosses. Consequently, our algorithm is also optimal for any starting history of
outcomes. To our knowledge, this is the first algorithm that employs an optimal
adaptive strategy under a Bayesian setting for this problem. Our proof of
optimality employs tools from the field of Markov games