967 research outputs found
Distributed Stochastic Optimization over Time-Varying Noisy Network
This paper is concerned with distributed stochastic multi-agent optimization
problem over a class of time-varying network with slowly decreasing
communication noise effects. This paper considers the problem in composite
optimization setting which is more general in noisy network optimization. It is
noteworthy that existing methods for noisy network optimization are Euclidean
projection based. We present two related different classes of non-Euclidean
methods and investigate their convergence behavior. One is distributed
stochastic composite mirror descent type method (DSCMD-N) which provides a more
general algorithm framework than former works in this literature. As a
counterpart, we also consider a composite dual averaging type method (DSCDA-N)
for noisy network optimization. Some main error bounds for DSCMD-N and DSCDA-N
are obtained. The trade-off among stepsizes, noise decreasing rates,
convergence rates of algorithm is analyzed in detail. To the best of our
knowledge, this is the first work to analyze and derive convergence rates of
optimization algorithm in noisy network optimization. We show that an optimal
rate of in nonsmooth convex optimization can be obtained for
proposed methods under appropriate communication noise condition. Moveover,
convergence rates in different orders are comprehensively derived in both
expectation convergence and high probability convergence sense.Comment: 27 page
Cooperative Online Learning: Keeping your Neighbors Updated
We study an asynchronous online learning setting with a network of agents. At
each time step, some of the agents are activated, requested to make a
prediction, and pay the corresponding loss. The loss function is then revealed
to these agents and also to their neighbors in the network. Our results
characterize how much knowing the network structure affects the regret as a
function of the model of agent activations. When activations are stochastic,
the optimal regret (up to constant factors) is shown to be of order
, where is the horizon and is the independence
number of the network. We prove that the upper bound is achieved even when
agents have no information about the network structure. When activations are
adversarial the situation changes dramatically: if agents ignore the network
structure, a lower bound on the regret can be proven, showing that
learning is impossible. However, when agents can choose to ignore some of their
neighbors based on the knowledge of the network structure, we prove a
sublinear regret bound, where is the clique-covering number of the network
D: Decentralized Training over Decentralized Data
While training a machine learning model using multiple workers, each of which
collects data from their own data sources, it would be most useful when the
data collected from different workers can be {\em unique} and {\em different}.
Ironically, recent analysis of decentralized parallel stochastic gradient
descent (D-PSGD) relies on the assumption that the data hosted on different
workers are {\em not too different}. In this paper, we ask the question: {\em
Can we design a decentralized parallel stochastic gradient descent algorithm
that is less sensitive to the data variance across workers?} In this paper, we
present D, a novel decentralized parallel stochastic gradient descent
algorithm designed for large data variance \xr{among workers} (imprecisely,
"decentralized" data). The core of D is a variance blackuction extension of
the standard D-PSGD algorithm, which improves the convergence rate from
to where
denotes the variance among data on different workers. As a result, D is
robust to data variance among workers. We empirically evaluated D on image
classification tasks where each worker has access to only the data of a limited
set of labels, and find that D significantly outperforms D-PSGD
Distributed Learning with Infinitely Many Hypotheses
We consider a distributed learning setup where a network of agents
sequentially access realizations of a set of random variables with unknown
distributions. The network objective is to find a parametrized distribution
that best describes their joint observations in the sense of the
Kullback-Leibler divergence. Apart from recent efforts in the literature, we
analyze the case of countably many hypotheses and the case of a continuum of
hypotheses. We provide non-asymptotic bounds for the concentration rate of the
agents' beliefs around the correct hypothesis in terms of the number of agents,
the network parameters, and the learning abilities of the agents. Additionally,
we provide a novel motivation for a general set of distributed Non-Bayesian
update rules as instances of the distributed stochastic mirror descent
algorithm.Comment: Submitted to CDC201
- …