1,665 research outputs found

    縮環型アゾベンゼンホウ素錯体を基盤とした狭エネルギーギャップ発光材料の創出

    Get PDF
    京都大学新制・課程博士博士(工学)甲第25246号工博第5205号京都大学大学院工学研究科高分子化学専攻(主査)教授 田中 一生, 教授 大北 英生, 教授 大内 誠学位規則第4条第1項該当Doctor of Philosophy (Engineering)Kyoto UniversityDGA

    Thompson Sampling for Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

    Full text link
    We study the real-valued combinatorial pure exploration of the multi-armed bandit (R-CPE-MAB) problem. In R-CPE-MAB, a player is given dd stochastic arms, and the reward of each arm s{1,,d}s\in\{1, \ldots, d\} follows an unknown distribution with mean μs\mu_s. In each time step, a player pulls a single arm and observes its reward. The player's goal is to identify the optimal \emph{action} π=arg maxπAμπ\boldsymbol{\pi}^{*} = \argmax_{\boldsymbol{\pi} \in \mathcal{A}} \boldsymbol{\mu}^{\top}\boldsymbol{\pi} from a finite-sized real-valued \emph{action set} ARd\mathcal{A}\subset \mathbb{R}^{d} with as few arm pulls as possible. Previous methods in the R-CPE-MAB assume that the size of the action set A\mathcal{A} is polynomial in dd. We introduce an algorithm named the Generalized Thompson Sampling Explore (GenTS-Explore) algorithm, which is the first algorithm that can work even when the size of the action set is exponentially large in dd. We also introduce a novel problem-dependent sample complexity lower bound of the R-CPE-MAB problem, and show that the GenTS-Explore algorithm achieves the optimal sample complexity up to a problem-dependent constant factor

    Part2 : Chapter 13 - Malaysia

    Get PDF

    Fixed-Budget Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

    Full text link
    We study the real-valued combinatorial pure exploration of the multi-armed bandit in the fixed-budget setting. We first introduce the Combinatorial Successive Asign (CSA) algorithm, which is the first algorithm that can identify the best action even when the size of the action class is exponentially large with respect to the number of arms. We show that the upper bound of the probability of error of the CSA algorithm matches a lower bound up to a logarithmic factor in the exponent. Then, we introduce another algorithm named the Minimax Combinatorial Successive Accepts and Rejects (Minimax-CombSAR) algorithm for the case where the size of the action class is polynomial, and show that it is optimal, which matches a lower bound. Finally, we experimentally compare the algorithms with previous methods and show that our algorithm performs better
    corecore