Search CORE

48 research outputs found

Cooperative Online Learning: Keeping your Neighbors Updated

Author: Cesa-Bianchi Nicolò
Cesari Tommaso R.
Monteleoni Claire
Publication venue
Publication date: 01/01/2020
Field of study

We study an asynchronous online learning setting with a network of agents. At each time step, some of the agents are activated, requested to make a prediction, and pay the corresponding loss. The loss function is then revealed to these agents and also to their neighbors in the network. Our results characterize how much knowing the network structure affects the regret as a function of the model of agent activations. When activations are stochastic, the optimal regret (up to constant factors) is shown to be of order

\sqrt{\alpha T}

, where

T

is the horizon and

\alpha

is the independence number of the network. We prove that the upper bound is achieved even when agents have no information about the network structure. When activations are adversarial the situation changes dramatically: if agents ignore the network structure, a

\Omega(T)

lower bound on the regret can be proven, showing that learning is impossible. However, when agents can choose to ignore some of their neighbors based on the knowledge of the network structure, we prove a

O(\sqrt{\overline{\chi} T})

sublinear regret bound, where

\overline{\chi} \ge \alpha

is the clique-covering number of the network

arXiv.org e-Print Archive

AIR Universita degli studi di Milano

Decentralized Cooperative Stochastic Bandits

Author: Kanade Varun
Martínez-Rubio David
Rebeschini Patrick
Publication venue
Publication date: 01/01/2019
Field of study

We study a decentralized cooperative stochastic multi-armed bandit problem with

K

arms on a network of

N

agents. In our model, the reward distribution of each arm is the same for each agent and rewards are drawn independently across agents and time steps. In each round, each agent chooses an arm to play and subsequently sends a message to her neighbors. The goal is to minimize the overall regret of the entire network. We design a fully decentralized algorithm that uses an accelerated consensus procedure to compute (delayed) estimates of the average of rewards obtained by all the agents for each arm, and then uses an upper confidence bound (UCB) algorithm that accounts for the delay and error of the estimates. We analyze the regret of our algorithm and also provide a lower bound. The regret is bounded by the optimal centralized regret plus a natural and simple term depending on the spectral gap of the communication matrix. Our algorithm is simpler to analyze than those proposed in prior work and it achieves better regret bounds, while requiring less information about the underlying network. It also performs better empirically

arXiv.org e-Print Archive

Oxford University Research Archive

Adaptive Channel Recommendation For Opportunistic Spectrum Access

Author: Chen Xu
Huang Jianwei
Li Husheng
Publication venue
Publication date: 13/07/2011
Field of study

We propose a dynamic spectrum access scheme where secondary users recommend "good" channels to each other and access accordingly. We formulate the problem as an average reward based Markov decision process. We show the existence of the optimal stationary spectrum access policy, and explore its structure properties in two asymptotic cases. Since the action space of the Markov decision process is continuous, it is difficult to find the optimal policy by simply discretizing the action space and use the policy iteration, value iteration, or Q-learning methods. Instead, we propose a new algorithm based on the Model Reference Adaptive Search method, and prove its convergence to the optimal policy. Numerical results show that the proposed algorithms achieve up to 18% and 100% performance improvement than the static channel recommendation scheme in homogeneous and heterogeneous channel environments, respectively, and is more robust to channel dynamics

arXiv.org e-Print Archive

CiteSeerX

Delay and Cooperation in Nonstochastic Bandits

Author: Cesa-Bianchi Nicolo'
Gentile Claudio
Mansour Yishay
Minora Alberto
Publication venue
Publication date: 01/01/2016
Field of study

We study networks of communicating learning agents that cooperate to solve a common nonstochastic bandit problem. Agents use an underlying communication network to get messages about actions selected by other agents, and drop messages that took more than

d

hops to arrive, where

d

is a delay parameter. We introduce \textsc{Exp3-Coop}, a cooperative version of the {\sc Exp3} algorithm and prove that with

K

actions and

N

agents the average per-agent regret after

T

rounds is at most of order

\sqrt{\bigl(d+1 + \tfrac{K}{N}\alpha_{\le d}\bigr)(T\ln K)}

, where

\alpha_{\le d}

is the independence number of the

d

-th power of the connected communication graph

G

. We then show that for any connected graph, for

d=\sqrt{K}

the regret bound is

K^{1/4}\sqrt{T}

, strictly better than the minimax regret

\sqrt{KT}

for noncooperating agents. More informed choices of

d

lead to bounds which are arbitrarily close to the full information minimax regret

\sqrt{T\ln K}

when

G

is dense. When

G

has sparse components, we show that a variant of \textsc{Exp3-Coop}, allowing agents to choose their parameters according to their centrality in

G

, strictly improves the regret. Finally, as a by-product of our analysis, we provide the first characterization of the minimax regret for bandit learning with delay.Comment: 30 page

arXiv.org e-Print Archive

AIR Universita degli studi di Milano

Archivio istituzionale della ricerca - Università dell'Insubria

The Relationship between Age of Post-Graduate Adult Learning Students and Learning Style Preferences: A Case of Africa International University, Kenya

Author: Ngala Francisca Wavinya
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 03/04/2017
Field of study

This paper sought to examine the relationship between age and learning preferences of post- graduate students at Africa International University (AIU). The study employed a descriptive survey design which used cross-sectional approach to data collection. The population of the study consisted of all the 397 post-graduate students at Africa International University at the time of data collection. The sample size used was made up of 199 participants from the post-graduate Diploma, Masters’ level and Doctoral programmes. A questionnaire guide was the instrument used to collect information from the participants on their age demographics and their preferences. Statistical Package for Social Sciences (SPSS) was used to analyze the data. A modified version of the Grasha - Riechmann Student Learning Style Scales (GRSLSS) was the learning style inventory that was used to measure the learning preferences. The findings revealed that age was not significantly related to the ways Post-graduate students at Africa International University preferred to learn. Keywords: Learning style preferences, Age, Post-graduate, Adult learning

International Institute for Science, Technology and Education (IISTE): E-Journals