10,155 research outputs found
Decentralized Cooperative Stochastic Bandits
We study a decentralized cooperative stochastic multi-armed bandit problem
with arms on a network of agents. In our model, the reward distribution
of each arm is the same for each agent and rewards are drawn independently
across agents and time steps. In each round, each agent chooses an arm to play
and subsequently sends a message to her neighbors. The goal is to minimize the
overall regret of the entire network. We design a fully decentralized algorithm
that uses an accelerated consensus procedure to compute (delayed) estimates of
the average of rewards obtained by all the agents for each arm, and then uses
an upper confidence bound (UCB) algorithm that accounts for the delay and error
of the estimates. We analyze the regret of our algorithm and also provide a
lower bound. The regret is bounded by the optimal centralized regret plus a
natural and simple term depending on the spectral gap of the communication
matrix. Our algorithm is simpler to analyze than those proposed in prior work
and it achieves better regret bounds, while requiring less information about
the underlying network. It also performs better empirically
Delay and Cooperation in Nonstochastic Bandits
We study networks of communicating learning agents that cooperate to solve a
common nonstochastic bandit problem. Agents use an underlying communication
network to get messages about actions selected by other agents, and drop
messages that took more than hops to arrive, where is a delay
parameter. We introduce \textsc{Exp3-Coop}, a cooperative version of the {\sc
Exp3} algorithm and prove that with actions and agents the average
per-agent regret after rounds is at most of order , where is the
independence number of the -th power of the connected communication graph
. We then show that for any connected graph, for the regret
bound is , strictly better than the minimax regret
for noncooperating agents. More informed choices of lead to bounds which
are arbitrarily close to the full information minimax regret
when is dense. When has sparse components, we show that a variant of
\textsc{Exp3-Coop}, allowing agents to choose their parameters according to
their centrality in , strictly improves the regret. Finally, as a by-product
of our analysis, we provide the first characterization of the minimax regret
for bandit learning with delay.Comment: 30 page
Optimal Statistical Rates for Decentralised Non-Parametric Regression with Linear Speed-Up
We analyse the learning performance of Distributed Gradient Descent in the
context of multi-agent decentralised non-parametric regression with the square
loss function when i.i.d. samples are assigned to agents. We show that if
agents hold sufficiently many samples with respect to the network size, then
Distributed Gradient Descent achieves optimal statistical rates with a number
of iterations that scales, up to a threshold, with the inverse of the spectral
gap of the gossip matrix divided by the number of samples owned by each agent
raised to a problem-dependent power. The presence of the threshold comes from
statistics. It encodes the existence of a "big data" regime where the number of
required iterations does not depend on the network topology. In this regime,
Distributed Gradient Descent achieves optimal statistical rates with the same
order of iterations as gradient descent run with all the samples in the
network. Provided the communication delay is sufficiently small, the
distributed protocol yields a linear speed-up in runtime compared to the
single-machine protocol. This is in contrast to decentralised optimisation
algorithms that do not exploit statistics and only yield a linear speed-up in
graphs where the spectral gap is bounded away from zero. Our results exploit
the statistical concentration of quantities held by agents and shed new light
on the interplay between statistics and communication in decentralised methods.
Bounds are given in the standard non-parametric setting with source/capacity
assumptions
Bibliographic Review on Distributed Kalman Filtering
In recent years, a compelling need has arisen to understand the effects of distributed information structures on estimation and filtering. In this paper, a bibliographical review on distributed Kalman filtering (DKF) is provided.\ud
The paper contains a classification of different approaches and methods involved to DKF. The applications of DKF are also discussed and explained separately. A comparison of different approaches is briefly carried out. Focuses on the contemporary research are also addressed with emphasis on the practical applications of the techniques. An exhaustive list of publications, linked directly or indirectly to DKF in the open literature, is compiled to provide an overall picture of different developing aspects of this area
Iterative learning control for multi-agent systems with impulsive consensus tracking
In this paper, we adopt D-type and PD-type learning laws with the initial state of iteration to achieve uniform tracking problem of multi-agent systems subjected to impulsive input. For the multi-agent system with impulse, we show that all agents are driven to achieve a given asymptotical consensus as the iteration number increases via the proposed learning laws if the virtual leader has a path to any follower agent. Finally, an example is illustrated to verify the effectiveness by tracking a continuous or piecewise continuous desired trajectory
Asynchronous Distributed ADMM for Large-Scale Optimization- Part II: Linear Convergence Analysis and Numerical Performance
The alternating direction method of multipliers (ADMM) has been recognized as
a versatile approach for solving modern large-scale machine learning and signal
processing problems efficiently. When the data size and/or the problem
dimension is large, a distributed version of ADMM can be used, which is capable
of distributing the computation load and the data set to a network of computing
nodes. Unfortunately, a direct synchronous implementation of such algorithm
does not scale well with the problem size, as the algorithm speed is limited by
the slowest computing nodes. To address this issue, in a companion paper, we
have proposed an asynchronous distributed ADMM (AD-ADMM) and studied its
worst-case convergence conditions. In this paper, we further the study by
characterizing the conditions under which the AD-ADMM achieves linear
convergence. Our conditions as well as the resulting linear rates reveal the
impact that various algorithm parameters, network delay and network size have
on the algorithm performance. To demonstrate the superior time efficiency of
the proposed AD-ADMM, we test the AD-ADMM on a high-performance computer
cluster by solving a large-scale logistic regression problem.Comment: submitted for publication, 28 page
- …