Search CORE

7,244 research outputs found

Cooperative Online Learning: Keeping your Neighbors Updated

Author: Cesa-Bianchi Nicolò
Cesari Tommaso R.
Monteleoni Claire
Publication venue
Publication date: 01/01/2020
Field of study

We study an asynchronous online learning setting with a network of agents. At each time step, some of the agents are activated, requested to make a prediction, and pay the corresponding loss. The loss function is then revealed to these agents and also to their neighbors in the network. Our results characterize how much knowing the network structure affects the regret as a function of the model of agent activations. When activations are stochastic, the optimal regret (up to constant factors) is shown to be of order

\sqrt{\alpha T}

, where

T

is the horizon and

\alpha

is the independence number of the network. We prove that the upper bound is achieved even when agents have no information about the network structure. When activations are adversarial the situation changes dramatically: if agents ignore the network structure, a

\Omega(T)

lower bound on the regret can be proven, showing that learning is impossible. However, when agents can choose to ignore some of their neighbors based on the knowledge of the network structure, we prove a

O(\sqrt{\overline{\chi} T})

sublinear regret bound, where

\overline{\chi} \ge \alpha

is the clique-covering number of the network

arXiv.org e-Print Archive

AIR Universita degli studi di Milano

Dueling Bandits with Adversarial Sleeping

Author: Gaillard Pierre
Saha Aadirupa
Publication venue
Publication date: 05/07/2021
Field of study

We introduce the problem of sleeping dueling bandits with stochastic preferences and adversarial availabilities (DB-SPAA). In almost all dueling bandit applications, the decision space often changes over time; eg, retail store management, online shopping, restaurant recommendation, search engine optimization, etc. Surprisingly, this `sleeping aspect' of dueling bandits has never been studied in the literature. Like dueling bandits, the goal is to compete with the best arm by sequentially querying the preference feedback of item pairs. The non-triviality however results due to the non-stationary item spaces that allow any arbitrary subsets items to go unavailable every round. The goal is to find an optimal `no-regret' policy that can identify the best available item at each round, as opposed to the standard `fixed best-arm regret objective' of dueling bandits. We first derive an instance-specific lower bound for DB-SPAA

\Omega( \sum_{i =1}^{K-1}\sum_{j=i+1}^K \frac{\log T}{\Delta(i,j)})

, where

K

is the number of items and

\Delta(i,j)

is the gap between items

i

and

j

. This indicates that the sleeping problem with preference feedback is inherently more difficult than that for classical multi-armed bandits (MAB). We then propose two algorithms, with near optimal regret guarantees. Our results are corroborated empirically

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Jamming-Resistant Learning in Wireless Networks

Author: Dams Johannes
Hoefer Martin
Kesselheim Thomas
Publication venue
Publication date: 23/07/2013
Field of study

We consider capacity maximization in wireless networks under adversarial interference conditions. There are n links, each consisting of a sender and a receiver, which repeatedly try to perform a successful transmission. In each time step, the success of attempted transmissions depends on interference conditions, which are captured by an interference model (e.g. the SINR model). Additionally, an adversarial jammer can render a (1-delta)-fraction of time steps unsuccessful. For this scenario, we analyze a framework for distributed learning algorithms to maximize the number of successful transmissions. Our main result is an algorithm based on no-regret learning converging to an O(1/delta)-approximation. It provides even a constant-factor approximation when the jammer exactly blocks a (1-delta)-fraction of time steps. In addition, we consider a stochastic jammer, for which we obtain a constant-factor approximation after a polynomial number of time steps. We also consider more general settings, in which links arrive and depart dynamically, and where each sender tries to reach multiple receivers. Our algorithms perform favorably in simulations.Comment: 22 pages, 2 figures, typos remove

arXiv.org e-Print Archive

Crossref

Publikationsserver der RWTH Aachen University

MPG.PuRe

One Arrow, Two Kills: An Unified Framework for Achieving Optimal Regret Guarantees in Sleeping Bandits

Author: Dan Soham
Gaillard Pierre
Saha Aadirupa
Publication venue
Publication date: 26/10/2022
Field of study

We address the problem of \emph{`Internal Regret'} in \emph{Sleeping Bandits} in the fully adversarial setup, as well as draw connections between different existing notions of sleeping regrets in the multiarmed bandits (MAB) literature and consequently analyze the implications: Our first contribution is to propose the new notion of \emph{Internal Regret} for sleeping MAB. We then proposed an algorithm that yields sublinear regret in that measure, even for a completely adversarial sequence of losses and availabilities. We further show that a low sleeping internal regret always implies a low external regret, and as well as a low policy regret for iid sequence of losses. The main contribution of this work precisely lies in unifying different notions of existing regret in sleeping bandits and understand the implication of one to another. Finally, we also extend our results to the setting of \emph{Dueling Bandits} (DB)--a preference feedback variant of MAB, and proposed a reduction to MAB idea to design a low regret algorithm for sleeping dueling bandits with stochastic preferences and adversarial availabilities. The efficacy of our algorithms is justified through empirical evaluations

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server