4 research outputs found
Stochastic Bandit Based on Empirical Moments
In the multiarmed bandit problem a gambler chooses an arm of a slot machine
to pull considering a tradeoff between exploration and exploitation. We study
the stochastic bandit problem where each arm has a reward distribution
supported in a known bounded interval, e.g. [0,1]. For this model, policies
which take into account the empirical variances (i.e. second moments) of the
arms are known to perform effectively. In this paper, we generalize this idea
and we propose a policy which exploits the first d empirical moments for
arbitrary d fixed in advance. The asymptotic upper bound of the regret of the
policy approaches the theoretical bound by Burnetas and Katehakis as d
increases. By choosing appropriate d, the proposed policy realizes a tradeoff
between the computational complexity and the expected regret
UCBoost: A Boosting Approach to Tame Complexity and Optimality for Stochastic Bandits
In this work, we address the open problem of finding low-complexity
near-optimal multi-armed bandit algorithms for sequential decision making
problems. Existing bandit algorithms are either sub-optimal and computationally
simple (e.g., UCB1) or optimal and computationally complex (e.g., kl-UCB). We
propose a boosting approach to Upper Confidence Bound based algorithms for
stochastic bandits, that we call UCBoost. Specifically, we propose two types of
UCBoost algorithms. We show that UCBoost() enjoys complexity for each
arm per round as well as regret guarantee that is -close to that of the
kl-UCB algorithm. We propose an approximation-based UCBoost algorithm,
UCBoost(), that enjoys a regret guarantee -close to that of
kl-UCB as well as complexity for each arm per round.
Hence, our algorithms provide practitioners a practical way to trade optimality
with computational complexity. Finally, we present numerical results which show
that UCBoost() can achieve the same regret performance as the
standard kl-UCB while incurring only of the computational cost of kl-UCB.Comment: Accepted by IJCAI 201
Stochastic Bandit Based on Empirical Moments Junya Honda
In the multiarmed bandit problem a gambler chooses an arm of a slot machine to pull considering a tradeoff between exploration and exploitation. We study the stochastic bandit problem where each arm has a reward distribution supported in [0, 1]. For this model, there exists a policy which achieves the theoretical bound asymptotically. However the optimal policy requires a computation of a convex optimization which involves the empirical distribution of each arm. In this paper, we propose a policy which exploits the first d empirical moments for arbitrary d fixed in advance. We show that the performance of the policy approaches the theoretical bound as d increases. This policy can be implemented by solving polynomial equations and we derive the explicit solution for d smaller than 5. By choosing appropriate d, the proposed policy realizes a tradeoff between the computational complexity and the expected regret.
MATHEMATICAL ENGINEERING TECHNICAL REPORTS Stochastic Bandit Based on Empirical Moments
scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder. Stochastic Bandit Based on Empirical Moments Junya Honda 1 ⋆ and Akimichi Takemura 2⋆