3,918 research outputs found

    QoS-Aware Multi-Armed Bandits

    Full text link
    Motivated by runtime verification of QoS requirements in self-adaptive and self-organizing systems that are able to reconfigure their structure and behavior in response to runtime data, we propose a QoS-aware variant of Thompson sampling for multi-armed bandits. It is applicable in settings where QoS satisfaction of an arm has to be ensured with high confidence efficiently, rather than finding the optimal arm while minimizing regret. Preliminary experimental results encourage further research in the field of QoS-aware decision making.Comment: Accepted at IEEE Workshop on Quality Assurance for Self-adaptive Self-organising Systems, FAS* 201

    Decentralized Exploration in Multi-Armed Bandits

    Full text link
    We consider the decentralized exploration problem: a set of players collaborate to identify the best arm by asynchronously interacting with the same stochastic environment. The objective is to insure privacy in the best arm identification problem between asynchronous, collaborative, and thrifty players. In the context of a digital service, we advocate that this decentralized approach allows a good balance between the interests of users and those of service providers: the providers optimize their services, while protecting the privacy of the users and saving resources. We define the privacy level as the amount of information an adversary could infer by intercepting the messages concerning a single user. We provide a generic algorithm Decentralized Elimination, which uses any best arm identification algorithm as a subroutine. We prove that this algorithm insures privacy, with a low communication cost, and that in comparison to the lower bound of the best arm identification problem, its sample complexity suffers from a penalty depending on the inverse of the probability of the most frequent players. Then, thanks to the genericity of the approach, we extend the proposed algorithm to the non-stationary bandits. Finally, experiments illustrate and complete the analysis

    Skyline Identification in Multi-Armed Bandits

    Full text link
    We introduce a variant of the classical PAC multi-armed bandit problem. There is an ordered set of nn arms A[1],,A[n]A[1],\dots,A[n], each with some stochastic reward drawn from some unknown bounded distribution. The goal is to identify the skylineskyline of the set AA, consisting of all arms A[i]A[i] such that A[i]A[i] has larger expected reward than all lower-numbered arms A[1],,A[i1]A[1],\dots,A[i-1]. We define a natural notion of an ε\varepsilon-approximate skyline and prove matching upper and lower bounds for identifying an ε\varepsilon-skyline. Specifically, we show that in order to identify an ε\varepsilon-skyline from among nn arms with probability 1δ1-\delta, Θ(nε2min{log(1εδ),log(nδ)}) \Theta\bigg(\frac{n}{\varepsilon^2} \cdot \min\bigg\{ \log\bigg(\frac{1}{\varepsilon \delta}\bigg), \log\bigg(\frac{n}{\delta}\bigg) \bigg\} \bigg) samples are necessary and sufficient. When ε1/n\varepsilon \gg 1/n, our results improve over the naive algorithm, which draws enough samples to approximate the expected reward of every arm; the algorithm of (Auer et al., AISTATS'16) for Pareto-optimal arm identification is likewise superseded. Our results show that the sample complexity of the skyline problem lies strictly in between that of best arm identification (Even-Dar et al., COLT'02) and that of approximating the expected reward of every arm.Comment: 18 pages, 2 Figures; an ALT'18/ISIT'18 submissio
    corecore