4 research outputs found

    Stochastic Bandit Based on Empirical Moments

    Full text link
    In the multiarmed bandit problem a gambler chooses an arm of a slot machine to pull considering a tradeoff between exploration and exploitation. We study the stochastic bandit problem where each arm has a reward distribution supported in a known bounded interval, e.g. [0,1]. For this model, policies which take into account the empirical variances (i.e. second moments) of the arms are known to perform effectively. In this paper, we generalize this idea and we propose a policy which exploits the first d empirical moments for arbitrary d fixed in advance. The asymptotic upper bound of the regret of the policy approaches the theoretical bound by Burnetas and Katehakis as d increases. By choosing appropriate d, the proposed policy realizes a tradeoff between the computational complexity and the expected regret

    UCBoost: A Boosting Approach to Tame Complexity and Optimality for Stochastic Bandits

    Full text link
    In this work, we address the open problem of finding low-complexity near-optimal multi-armed bandit algorithms for sequential decision making problems. Existing bandit algorithms are either sub-optimal and computationally simple (e.g., UCB1) or optimal and computationally complex (e.g., kl-UCB). We propose a boosting approach to Upper Confidence Bound based algorithms for stochastic bandits, that we call UCBoost. Specifically, we propose two types of UCBoost algorithms. We show that UCBoost(DD) enjoys O(1)O(1) complexity for each arm per round as well as regret guarantee that is 1/e1/e-close to that of the kl-UCB algorithm. We propose an approximation-based UCBoost algorithm, UCBoost(ϵ\epsilon), that enjoys a regret guarantee ϵ\epsilon-close to that of kl-UCB as well as O(log(1/ϵ))O(\log(1/\epsilon)) complexity for each arm per round. Hence, our algorithms provide practitioners a practical way to trade optimality with computational complexity. Finally, we present numerical results which show that UCBoost(ϵ\epsilon) can achieve the same regret performance as the standard kl-UCB while incurring only 1%1\% of the computational cost of kl-UCB.Comment: Accepted by IJCAI 201

    Stochastic Bandit Based on Empirical Moments Junya Honda

    No full text
    In the multiarmed bandit problem a gambler chooses an arm of a slot machine to pull considering a tradeoff between exploration and exploitation. We study the stochastic bandit problem where each arm has a reward distribution supported in [0, 1]. For this model, there exists a policy which achieves the theoretical bound asymptotically. However the optimal policy requires a computation of a convex optimization which involves the empirical distribution of each arm. In this paper, we propose a policy which exploits the first d empirical moments for arbitrary d fixed in advance. We show that the performance of the policy approaches the theoretical bound as d increases. This policy can be implemented by solving polynomial equations and we derive the explicit solution for d smaller than 5. By choosing appropriate d, the proposed policy realizes a tradeoff between the computational complexity and the expected regret.

    MATHEMATICAL ENGINEERING TECHNICAL REPORTS Stochastic Bandit Based on Empirical Moments

    No full text
    scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder. Stochastic Bandit Based on Empirical Moments Junya Honda 1 ⋆ and Akimichi Takemura 2⋆
    corecore