Search CORE

967 research outputs found

Distributed Stochastic Optimization over Time-Varying Noisy Network

Author: Ho Daniel W. C.
Yu Zhan
Yuan Deming
Publication venue
Publication date: 08/05/2020
Field of study

This paper is concerned with distributed stochastic multi-agent optimization problem over a class of time-varying network with slowly decreasing communication noise effects. This paper considers the problem in composite optimization setting which is more general in noisy network optimization. It is noteworthy that existing methods for noisy network optimization are Euclidean projection based. We present two related different classes of non-Euclidean methods and investigate their convergence behavior. One is distributed stochastic composite mirror descent type method (DSCMD-N) which provides a more general algorithm framework than former works in this literature. As a counterpart, we also consider a composite dual averaging type method (DSCDA-N) for noisy network optimization. Some main error bounds for DSCMD-N and DSCDA-N are obtained. The trade-off among stepsizes, noise decreasing rates, convergence rates of algorithm is analyzed in detail. To the best of our knowledge, this is the first work to analyze and derive convergence rates of optimization algorithm in noisy network optimization. We show that an optimal rate of

O(1/\sqrt{T})

in nonsmooth convex optimization can be obtained for proposed methods under appropriate communication noise condition. Moveover, convergence rates in different orders are comprehensively derived in both expectation convergence and high probability convergence sense.Comment: 27 page

arXiv.org e-Print Archive

Cooperative Online Learning: Keeping your Neighbors Updated

Author: Cesa-Bianchi Nicolò
Cesari Tommaso R.
Monteleoni Claire
Publication venue
Publication date: 01/01/2020
Field of study

We study an asynchronous online learning setting with a network of agents. At each time step, some of the agents are activated, requested to make a prediction, and pay the corresponding loss. The loss function is then revealed to these agents and also to their neighbors in the network. Our results characterize how much knowing the network structure affects the regret as a function of the model of agent activations. When activations are stochastic, the optimal regret (up to constant factors) is shown to be of order

\sqrt{\alpha T}

, where

T

is the horizon and

\alpha

is the independence number of the network. We prove that the upper bound is achieved even when agents have no information about the network structure. When activations are adversarial the situation changes dramatically: if agents ignore the network structure, a

\Omega(T)

lower bound on the regret can be proven, showing that learning is impossible. However, when agents can choose to ignore some of their neighbors based on the knowledge of the network structure, we prove a

O(\sqrt{\overline{\chi} T})

sublinear regret bound, where

\overline{\chi} \ge \alpha

is the clique-covering number of the network

arXiv.org e-Print Archive

AIR Universita degli studi di Milano

D $^2$ : Decentralized Training over Decentralized Data

Author: Lian Xiangru
Liu Ji
Tang Hanlin
Yan Ming
Zhang Ce
Publication venue
Publication date: 01/01/2018
Field of study

While training a machine learning model using multiple workers, each of which collects data from their own data sources, it would be most useful when the data collected from different workers can be {\em unique} and {\em different}. Ironically, recent analysis of decentralized parallel stochastic gradient descent (D-PSGD) relies on the assumption that the data hosted on different workers are {\em not too different}. In this paper, we ask the question: {\em Can we design a decentralized parallel stochastic gradient descent algorithm that is less sensitive to the data variance across workers?} In this paper, we present D

^2

, a novel decentralized parallel stochastic gradient descent algorithm designed for large data variance \xr{among workers} (imprecisely, "decentralized" data). The core of D

^2

is a variance blackuction extension of the standard D-PSGD algorithm, which improves the convergence rate from

O\left({\sigma \over \sqrt{nT}} + {(n\zeta^2)^{\frac{1}{3}} \over T^{2/3}}\right)

O\left({\sigma \over \sqrt{nT}}\right)

where

\zeta^{2}

denotes the variance among data on different workers. As a result, D

^2

is robust to data variance among workers. We empirically evaluated D

^2

on image classification tasks where each worker has access to only the data of a limited set of labels, and find that D

^2

significantly outperforms D-PSGD

arXiv.org e-Print Archive

Repository for Publications and Research Data

Distributed Learning with Infinitely Many Hypotheses

Author: Nedić Angelia
Olshevsky Alex
Uribe César
Publication venue
Publication date: 06/05/2016
Field of study

We consider a distributed learning setup where a network of agents sequentially access realizations of a set of random variables with unknown distributions. The network objective is to find a parametrized distribution that best describes their joint observations in the sense of the Kullback-Leibler divergence. Apart from recent efforts in the literature, we analyze the case of countably many hypotheses and the case of a continuum of hypotheses. We provide non-asymptotic bounds for the concentration rate of the agents' beliefs around the correct hypothesis in terms of the number of agents, the network parameters, and the learning abilities of the agents. Additionally, we provide a novel motivation for a general set of distributed Non-Bayesian update rules as instances of the distributed stochastic mirror descent algorithm.Comment: Submitted to CDC201

arXiv.org e-Print Archive

Crossref