Search CORE

45 research outputs found

Global Multi-armed Bandits with Hölder Continuity

Author: Cem Tekin
Mihaela Van Der Schaar
Onur Atan
Publication venue
Publication date: 05/03/2020
Field of study

Abstract Standard Multi-Armed Bandit (MAB) problems assume that the arms are independent. However, in many application scenarios, the information obtained by playing an arm provides information about the remainder of the arms. Hence, in such applications, this informativeness can and should be exploited to enable faster convergence to the optimal solution. In this paper, formalize a new class of multi-armed bandit methods, Global Multi-armed Bandit (GMAB), in which arms are globally informative through a global parameter, i.e., choosing an arm reveals information about all the arms. We propose a greedy policy for the GMAB which always selects the arm with the highest estimated expected reward, and prove that it achieves bounded parameter-dependent regret. Hence, this policy selects suboptimal arms only finitely many times, and after a finite number of initial time steps, the optimal arm is selected in all of the remaining time steps with probability one. In addition, we also study how the informativeness of the arms about each other's rewards affects the speed of learning. Specifically, we prove that the parameter-free (worst-case) regret is sublinear in time, and decreases with the informativeness of the arms. We also prove a sublinear in time Bayesian risk bound for the GMAB which reduces to the well-known Bayesian risk bound for linearly parameterized bandits when the arms are fully informative. GMABs have applications ranging from drug dosage control to dynamic pricing

CiteSeerX

Jamming Bandits - A Novel Learning Method for Optimal Jamming

Author: Amuru S.
Buehrer R.M.
Tekin C.
Van Der Schaar M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Can an intelligent jammer learn and adapt to unknown environments in an electronic warfare-type scenario? In this paper, we answer this question in the positive, by developing a cognitive jammer that adaptively and optimally disrupts the communication between a victim transmitter-receiver pair. We formalize the problem using a multiarmed bandit framework where the jammer can choose various physical layer parameters such as the signaling scheme, power level and the on-off/pulsing duration in an attempt to obtain power efficient jamming strategies. We first present online learning algorithms to maximize the jamming efficacy against static transmitter-receiver pairs and prove that these algorithms converge to the optimal (in terms of the error rate inflicted at the victim and the energy used) jamming strategy. Even more importantly, we prove that the rate of convergence to the optimal jamming strategy is sublinear, i.e., the learning is fast in comparison to existing reinforcement learning algorithms, which is particularly important in dynamically changing wireless environments. Also, we characterize the performance of the proposed bandit-based learning algorithm against multiple static and adaptive transmitter-receiver pairs. © 2015 IEEE

Bilkent University Institutional Repository

Online Influence Maximization: Concept and Algorithm

Author: Guo Jianxiong
Publication venue
Publication date: 30/11/2023
Field of study

In this survey, we offer an extensive overview of the Online Influence Maximization (IM) problem by covering both theoretical aspects and practical applications. For the integrity of the article and because the online algorithm takes an offline oracle as a subroutine, we first make a clear definition of the Offline IM problem and summarize those commonly used Offline IM algorithms, which include traditional approximation or heuristic algorithms and ML-based algorithms. Then, we give a standard definition of the Online IM problem and a basic Combinatorial Multi-Armed Bandit (CMAB) framework, CMAB-T. Here, we summarize three types of feedback in the CMAB model and discuss in detail how to study the Online IM problem based on the CMAB-T model. This paves the way for solving the Online IM problem by using online learning methods. Furthermore, we have covered almost all Online IM algorithms up to now, focusing on characteristics and theoretical guarantees of online algorithms for different feedback types. Here, we elaborately explain their working principle and how to obtain regret bounds. Besides, we also collect plenty of innovative ideas about problem definition and algorithm designs and pioneering works for variants of the Online IM problem and their corresponding algorithms. Finally, we encapsulate current challenges and outline prospective research directions from four distinct perspectives

arXiv.org e-Print Archive

Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization

Author: Bogunovic Ilijia
Cevher Volkan
Scarlett Jonathan
Publication venue
Publication date: 31/05/2017
Field of study

In this paper, we consider the problem of sequentially optimizing a black-box function

f

based on noisy samples and bandit feedback. We assume that

f

is smooth in the sense of having a bounded norm in some reproducing kernel Hilbert space (RKHS), yielding a commonly-considered non-Bayesian form of Gaussian process bandit optimization. We provide algorithm-independent lower bounds on the simple regret, measuring the suboptimality of a single point reported after

T

rounds, and on the cumulative regret, measuring the sum of regrets over the

T

chosen points. For the isotropic squared-exponential kernel in

d

dimensions, we find that an average simple regret of

\epsilon

requires

T = \Omega\big(\frac{1}{\epsilon^2} (\log\frac{1}{\epsilon})^{d/2}\big)

, and the average cumulative regret is at least

\Omega\big( \sqrt{T(\log T)^{d/2}} \big)

, thus matching existing upper bounds up to the replacement of

d/2

2d+O(1)

in both cases. For the Mat\'ern-

\nu

kernel, we give analogous bounds of the form

\Omega\big( (\frac{1}{\epsilon})^{2+d/\nu}\big)

and

\Omega\big( T^{\frac{\nu + d}{2\nu + d}} \big)

, and discuss the resulting gaps to the existing upper bounds.Comment: Appearing in COLT 2017. This version corrects a few minor mistakes in Table I, which summarizes the new and existing regret bound

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne