Search CORE

3,918 research outputs found

QoS-Aware Multi-Armed Bandits

Author: Belzner Lenz
Gabor Thomas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/02/2017
Field of study

Motivated by runtime verification of QoS requirements in self-adaptive and self-organizing systems that are able to reconfigure their structure and behavior in response to runtime data, we propose a QoS-aware variant of Thompson sampling for multi-armed bandits. It is applicable in settings where QoS satisfaction of an arm has to be ensured with high confidence efficiently, rather than finding the optimal arm while minimizing regret. Preliminary experimental results encourage further research in the field of QoS-aware decision making.Comment: Accepted at IEEE Workshop on Quality Assurance for Self-adaptive Self-organising Systems, FAS* 201

arXiv.org e-Print Archive

Crossref

Decentralized Exploration in Multi-Armed Bandits

Author: Alami Réda
Féraud Raphaël
Laroche Romain
Publication venue
Publication date: 13/05/2019
Field of study

We consider the decentralized exploration problem: a set of players collaborate to identify the best arm by asynchronously interacting with the same stochastic environment. The objective is to insure privacy in the best arm identification problem between asynchronous, collaborative, and thrifty players. In the context of a digital service, we advocate that this decentralized approach allows a good balance between the interests of users and those of service providers: the providers optimize their services, while protecting the privacy of the users and saving resources. We define the privacy level as the amount of information an adversary could infer by intercepting the messages concerning a single user. We provide a generic algorithm Decentralized Elimination, which uses any best arm identification algorithm as a subroutine. We prove that this algorithm insures privacy, with a low communication cost, and that in comparison to the lower bound of the best arm identification problem, its sample complexity suffers from a penalty depending on the inverse of the probability of the most frequent players. Then, thanks to the genericity of the approach, we extend the proposed algorithm to the non-stationary bandits. Finally, experiments illustrate and complete the analysis

arXiv.org e-Print Archive

Skyline Identification in Multi-Armed Bandits

Author: Cheu Albert
Sundaram Ravi
Ullman Jonathan
Publication venue
Publication date: 09/01/2018
Field of study

We introduce a variant of the classical PAC multi-armed bandit problem. There is an ordered set of

n

arms

A[1],\dots,A[n]

, each with some stochastic reward drawn from some unknown bounded distribution. The goal is to identify the

skyline

of the set

A

, consisting of all arms

A[i]

such that

A[i]

has larger expected reward than all lower-numbered arms

A[1],\dots,A[i-1]

. We define a natural notion of an

\varepsilon

-approximate skyline and prove matching upper and lower bounds for identifying an

\varepsilon

-skyline. Specifically, we show that in order to identify an

\varepsilon

-skyline from among

n

arms with probability

1-\delta

\Theta\bigg(\frac{n}{\varepsilon^2} \cdot \min\bigg\{ \log\bigg(\frac{1}{\varepsilon \delta}\bigg), \log\bigg(\frac{n}{\delta}\bigg) \bigg\} \bigg)

samples are necessary and sufficient. When

\varepsilon \gg 1/n

, our results improve over the naive algorithm, which draws enough samples to approximate the expected reward of every arm; the algorithm of (Auer et al., AISTATS'16) for Pareto-optimal arm identification is likewise superseded. Our results show that the sample complexity of the skyline problem lies strictly in between that of best arm identification (Even-Dar et al., COLT'02) and that of approximating the expected reward of every arm.Comment: 18 pages, 2 Figures; an ALT'18/ISIT'18 submissio

arXiv.org e-Print Archive

Crossref