Search CORE

1,665 research outputs found

縮環型アゾベンゼンホウ素錯体を基盤とした狭エネルギーギャップ発光材料の創出

Author: Nakamura Masashi
Publication venue: 京都大学
Publication date: 25/03/2024
Field of study

京都大学新制・課程博士博士(工学)甲第25246号工博第5205号京都大学大学院工学研究科高分子化学専攻(主査)教授田中一生, 教授大北英生, 教授大内誠学位規則第4条第1項該当Doctor of Philosophy (Engineering)Kyoto UniversityDGA

Kyoto University Research Information Repository

Thompson Sampling for Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

Author: Nakamura Shintaro
Sugiyama Masashi
Publication venue
Publication date: 20/08/2023
Field of study

We study the real-valued combinatorial pure exploration of the multi-armed bandit (R-CPE-MAB) problem. In R-CPE-MAB, a player is given

d

stochastic arms, and the reward of each arm

s\in\{1, \ldots, d\}

follows an unknown distribution with mean

\mu_s

. In each time step, a player pulls a single arm and observes its reward. The player's goal is to identify the optimal \emph{action}

\boldsymbol{\pi}^{*} = \argmax_{\boldsymbol{\pi} \in \mathcal{A}} \boldsymbol{\mu}^{\top}\boldsymbol{\pi}

from a finite-sized real-valued \emph{action set}

\mathcal{A}\subset \mathbb{R}^{d}

with as few arm pulls as possible. Previous methods in the R-CPE-MAB assume that the size of the action set

\mathcal{A}

is polynomial in

d

. We introduce an algorithm named the Generalized Thompson Sampling Explore (GenTS-Explore) algorithm, which is the first algorithm that can work even when the size of the action set is exponentially large in

d

. We also introduce a novel problem-dependent sample complexity lower bound of the R-CPE-MAB problem, and show that the GenTS-Explore algorithm achieves the optimal sample complexity up to a problem-dependent constant factor

arXiv.org e-Print Archive

Part2 : Chapter 13 - Malaysia

Author: Masashi Nakamura
So Umezaki
Publication venue: Institute of Developing Economies (IDE-JETRO)
Publication date: 01/01/2007
Field of study

Academic Research Repository at the Institute of Developing Economies

Fixed-Budget Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

Author: Nakamura Shintaro
Sugiyama Masashi
Publication venue
Publication date: 15/11/2023
Field of study

We study the real-valued combinatorial pure exploration of the multi-armed bandit in the fixed-budget setting. We first introduce the Combinatorial Successive Asign (CSA) algorithm, which is the first algorithm that can identify the best action even when the size of the action class is exponentially large with respect to the number of arms. We show that the upper bound of the probability of error of the CSA algorithm matches a lower bound up to a logarithmic factor in the exponent. Then, we introduce another algorithm named the Minimax Combinatorial Successive Accepts and Rejects (Minimax-CombSAR) algorithm for the case where the size of the action class is polynomial, and show that it is optimal, which matches a lower bound. Finally, we experimentally compare the algorithms with previous methods and show that our algorithm performs better

arXiv.org e-Print Archive