296,004 research outputs found
Stochastic Subgradient Algorithms for Strongly Convex Optimization over Distributed Networks
We study diffusion and consensus based optimization of a sum of unknown
convex objective functions over distributed networks. The only access to these
functions is through stochastic gradient oracles, each of which is only
available at a different node, and a limited number of gradient oracle calls is
allowed at each node. In this framework, we introduce a convex optimization
algorithm based on the stochastic gradient descent (SGD) updates. Particularly,
we use a carefully designed time-dependent weighted averaging of the SGD
iterates, which yields a convergence rate of
after gradient updates for each node on
a network of nodes. We then show that after gradient oracle calls, the
average SGD iterate achieves a mean square deviation (MSD) of
. This rate of convergence is optimal as it
matches the performance lower bound up to constant terms. Similar to the SGD
algorithm, the computational complexity of the proposed algorithm also scales
linearly with the dimensionality of the data. Furthermore, the communication
load of the proposed method is the same as the communication load of the SGD
algorithm. Thus, the proposed algorithm is highly efficient in terms of
complexity and communication load. We illustrate the merits of the algorithm
with respect to the state-of-art methods over benchmark real life data sets and
widely studied network topologies
Decentralized Learning with Separable Data: Generalization and Fast Algorithms
Decentralized learning offers privacy and communication efficiency when data
are naturally distributed among agents communicating over an underlying graph.
Motivated by overparameterized learning settings, in which models are trained
to zero training loss, we study algorithmic and generalization properties of
decentralized learning with gradient descent on separable data. Specifically,
for decentralized gradient descent (DGD) and a variety of loss functions that
asymptote to zero at infinity (including exponential and logistic losses), we
derive novel finite-time generalization bounds. This complements a long line of
recent work that studies the generalization performance and the implicit bias
of gradient descent over separable data, but has thus far been limited to
centralized learning scenarios. Notably, our generalization bounds match in
order their centralized counterparts. Critical behind this, and of independent
interest, is establishing novel bounds on the training loss and the
rate-of-consensus of DGD for a class of self-bounded losses. Finally, on the
algorithmic front, we design improved gradient-based routines for decentralized
learning with separable data and empirically demonstrate orders-of-magnitude of
speed-up in terms of both training and generalization performance
Gossip Algorithms for Distributed Signal Processing
Gossip algorithms are attractive for in-network processing in sensor networks
because they do not require any specialized routing, there is no bottleneck or
single point of failure, and they are robust to unreliable wireless network
conditions. Recently, there has been a surge of activity in the computer
science, control, signal processing, and information theory communities,
developing faster and more robust gossip algorithms and deriving theoretical
performance guarantees. This article presents an overview of recent work in the
area. We describe convergence rate results, which are related to the number of
transmitted messages and thus the amount of energy consumed in the network for
gossiping. We discuss issues related to gossiping over wireless links,
including the effects of quantization and noise, and we illustrate the use of
gossip algorithms for canonical signal processing tasks including distributed
estimation, source localization, and compression.Comment: Submitted to Proceedings of the IEEE, 29 page
- …