3,918 research outputs found
QoS-Aware Multi-Armed Bandits
Motivated by runtime verification of QoS requirements in self-adaptive and
self-organizing systems that are able to reconfigure their structure and
behavior in response to runtime data, we propose a QoS-aware variant of
Thompson sampling for multi-armed bandits. It is applicable in settings where
QoS satisfaction of an arm has to be ensured with high confidence efficiently,
rather than finding the optimal arm while minimizing regret. Preliminary
experimental results encourage further research in the field of QoS-aware
decision making.Comment: Accepted at IEEE Workshop on Quality Assurance for Self-adaptive
Self-organising Systems, FAS* 201
Decentralized Exploration in Multi-Armed Bandits
We consider the decentralized exploration problem: a set of players
collaborate to identify the best arm by asynchronously interacting with the
same stochastic environment. The objective is to insure privacy in the best arm
identification problem between asynchronous, collaborative, and thrifty
players. In the context of a digital service, we advocate that this
decentralized approach allows a good balance between the interests of users and
those of service providers: the providers optimize their services, while
protecting the privacy of the users and saving resources. We define the privacy
level as the amount of information an adversary could infer by intercepting the
messages concerning a single user. We provide a generic algorithm Decentralized
Elimination, which uses any best arm identification algorithm as a subroutine.
We prove that this algorithm insures privacy, with a low communication cost,
and that in comparison to the lower bound of the best arm identification
problem, its sample complexity suffers from a penalty depending on the inverse
of the probability of the most frequent players. Then, thanks to the genericity
of the approach, we extend the proposed algorithm to the non-stationary
bandits. Finally, experiments illustrate and complete the analysis
Skyline Identification in Multi-Armed Bandits
We introduce a variant of the classical PAC multi-armed bandit problem. There
is an ordered set of arms , each with some stochastic
reward drawn from some unknown bounded distribution. The goal is to identify
the of the set , consisting of all arms such that
has larger expected reward than all lower-numbered arms . We
define a natural notion of an -approximate skyline and prove
matching upper and lower bounds for identifying an -skyline.
Specifically, we show that in order to identify an -skyline from
among arms with probability , samples are necessary and sufficient. When , our results improve over the naive algorithm, which draws enough samples
to approximate the expected reward of every arm; the algorithm of (Auer et al.,
AISTATS'16) for Pareto-optimal arm identification is likewise superseded. Our
results show that the sample complexity of the skyline problem lies strictly in
between that of best arm identification (Even-Dar et al., COLT'02) and that of
approximating the expected reward of every arm.Comment: 18 pages, 2 Figures; an ALT'18/ISIT'18 submissio
- …