48 research outputs found
Cooperative Online Learning: Keeping your Neighbors Updated
We study an asynchronous online learning setting with a network of agents. At
each time step, some of the agents are activated, requested to make a
prediction, and pay the corresponding loss. The loss function is then revealed
to these agents and also to their neighbors in the network. Our results
characterize how much knowing the network structure affects the regret as a
function of the model of agent activations. When activations are stochastic,
the optimal regret (up to constant factors) is shown to be of order
, where is the horizon and is the independence
number of the network. We prove that the upper bound is achieved even when
agents have no information about the network structure. When activations are
adversarial the situation changes dramatically: if agents ignore the network
structure, a lower bound on the regret can be proven, showing that
learning is impossible. However, when agents can choose to ignore some of their
neighbors based on the knowledge of the network structure, we prove a
sublinear regret bound, where is the clique-covering number of the network
Decentralized Cooperative Stochastic Bandits
We study a decentralized cooperative stochastic multi-armed bandit problem
with arms on a network of agents. In our model, the reward distribution
of each arm is the same for each agent and rewards are drawn independently
across agents and time steps. In each round, each agent chooses an arm to play
and subsequently sends a message to her neighbors. The goal is to minimize the
overall regret of the entire network. We design a fully decentralized algorithm
that uses an accelerated consensus procedure to compute (delayed) estimates of
the average of rewards obtained by all the agents for each arm, and then uses
an upper confidence bound (UCB) algorithm that accounts for the delay and error
of the estimates. We analyze the regret of our algorithm and also provide a
lower bound. The regret is bounded by the optimal centralized regret plus a
natural and simple term depending on the spectral gap of the communication
matrix. Our algorithm is simpler to analyze than those proposed in prior work
and it achieves better regret bounds, while requiring less information about
the underlying network. It also performs better empirically
Adaptive Channel Recommendation For Opportunistic Spectrum Access
We propose a dynamic spectrum access scheme where secondary users recommend
"good" channels to each other and access accordingly. We formulate the problem
as an average reward based Markov decision process. We show the existence of
the optimal stationary spectrum access policy, and explore its structure
properties in two asymptotic cases. Since the action space of the Markov
decision process is continuous, it is difficult to find the optimal policy by
simply discretizing the action space and use the policy iteration, value
iteration, or Q-learning methods. Instead, we propose a new algorithm based on
the Model Reference Adaptive Search method, and prove its convergence to the
optimal policy. Numerical results show that the proposed algorithms achieve up
to 18% and 100% performance improvement than the static channel recommendation
scheme in homogeneous and heterogeneous channel environments, respectively, and
is more robust to channel dynamics
Delay and Cooperation in Nonstochastic Bandits
We study networks of communicating learning agents that cooperate to solve a
common nonstochastic bandit problem. Agents use an underlying communication
network to get messages about actions selected by other agents, and drop
messages that took more than hops to arrive, where is a delay
parameter. We introduce \textsc{Exp3-Coop}, a cooperative version of the {\sc
Exp3} algorithm and prove that with actions and agents the average
per-agent regret after rounds is at most of order , where is the
independence number of the -th power of the connected communication graph
. We then show that for any connected graph, for the regret
bound is , strictly better than the minimax regret
for noncooperating agents. More informed choices of lead to bounds which
are arbitrarily close to the full information minimax regret
when is dense. When has sparse components, we show that a variant of
\textsc{Exp3-Coop}, allowing agents to choose their parameters according to
their centrality in , strictly improves the regret. Finally, as a by-product
of our analysis, we provide the first characterization of the minimax regret
for bandit learning with delay.Comment: 30 page
The Relationship between Age of Post-Graduate Adult Learning Students and Learning Style Preferences: A Case of Africa International University, Kenya
This paper sought to examine the relationship between age and learning preferences of post- graduate students at Africa International University (AIU). The study employed a descriptive survey design which used cross-sectional approach to data collection. The population of the study consisted of all the 397 post-graduate students at Africa International University at the time of data collection. The sample size used was made up of 199 participants from the post-graduate Diploma, Masters’ level and Doctoral programmes. A questionnaire guide was the instrument used to collect information from the participants on their age demographics and their preferences. Statistical Package for Social Sciences (SPSS) was used to analyze the data. A modified version of the Grasha - Riechmann Student Learning Style Scales (GRSLSS) was the learning style inventory that was used to measure the learning preferences. The findings revealed that age was not significantly related to the ways Post-graduate students at Africa International University preferred to learn. Keywords: Learning style preferences, Age, Post-graduate, Adult learning