1,665 research outputs found
縮環型アゾベンゼンホウ素錯体を基盤とした狭エネルギーギャップ発光材料の創出
京都大学新制・課程博士博士(工学)甲第25246号工博第5205号京都大学大学院工学研究科高分子化学専攻(主査)教授 田中 一生, 教授 大北 英生, 教授 大内 誠学位規則第4条第1項該当Doctor of Philosophy (Engineering)Kyoto UniversityDGA
Thompson Sampling for Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit
We study the real-valued combinatorial pure exploration of the multi-armed
bandit (R-CPE-MAB) problem. In R-CPE-MAB, a player is given stochastic
arms, and the reward of each arm follows an unknown
distribution with mean . In each time step, a player pulls a single arm
and observes its reward. The player's goal is to identify the optimal
\emph{action} from a finite-sized
real-valued \emph{action set} with as few
arm pulls as possible. Previous methods in the R-CPE-MAB assume that the size
of the action set is polynomial in . We introduce an algorithm
named the Generalized Thompson Sampling Explore (GenTS-Explore) algorithm,
which is the first algorithm that can work even when the size of the action set
is exponentially large in . We also introduce a novel problem-dependent
sample complexity lower bound of the R-CPE-MAB problem, and show that the
GenTS-Explore algorithm achieves the optimal sample complexity up to a
problem-dependent constant factor
Fixed-Budget Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit
We study the real-valued combinatorial pure exploration of the multi-armed
bandit in the fixed-budget setting. We first introduce the Combinatorial
Successive Asign (CSA) algorithm, which is the first algorithm that can
identify the best action even when the size of the action class is
exponentially large with respect to the number of arms. We show that the upper
bound of the probability of error of the CSA algorithm matches a lower bound up
to a logarithmic factor in the exponent. Then, we introduce another algorithm
named the Minimax Combinatorial Successive Accepts and Rejects
(Minimax-CombSAR) algorithm for the case where the size of the action class is
polynomial, and show that it is optimal, which matches a lower bound. Finally,
we experimentally compare the algorithms with previous methods and show that
our algorithm performs better
- …