53,026 research outputs found
Fast Composite Optimization and Statistical Recovery in Federated Learning
As a prevalent distributed learning paradigm, Federated Learning (FL) trains
a global model on a massive amount of devices with infrequent communication.
This paper investigates a class of composite optimization and statistical
recovery problems in the FL setting, whose loss function consists of a
data-dependent smooth loss and a non-smooth regularizer. Examples include
sparse linear regression using Lasso, low-rank matrix recovery using nuclear
norm regularization, etc. In the existing literature, federated composite
optimization algorithms are designed only from an optimization perspective
without any statistical guarantees. In addition, they do not consider commonly
used (restricted) strong convexity in statistical recovery problems. We advance
the frontiers of this problem from both optimization and statistical
perspectives. From optimization upfront, we propose a new algorithm named
\textit{Fast Federated Dual Averaging} for strongly convex and smooth loss and
establish state-of-the-art iteration and communication complexity in the
composite setting. In particular, we prove that it enjoys a fast rate, linear
speedup, and reduced communication rounds. From statistical upfront, for
restricted strongly convex and smooth loss, we design another algorithm, namely
\textit{Multi-stage Federated Dual Averaging}, and prove a high probability
complexity bound with linear speedup up to optimal statistical precision.
Experiments in both synthetic and real data demonstrate that our methods
perform better than other baselines. To the best of our knowledge, this is the
first work providing fast optimization algorithms and statistical recovery
guarantees for composite problems in FL.Comment: This is a revised version to fix the imprecise statements about
linear speedup from the ICML proceedings. We use another averaging scheme for
the returned solutions in Theorem 2.1 and 3.1 to guarantee linear speedup
when the number of iterations is larg
Distributed Averaging via Lifted Markov Chains
Motivated by applications of distributed linear estimation, distributed
control and distributed optimization, we consider the question of designing
linear iterative algorithms for computing the average of numbers in a network.
Specifically, our interest is in designing such an algorithm with the fastest
rate of convergence given the topological constraints of the network. As the
main result of this paper, we design an algorithm with the fastest possible
rate of convergence using a non-reversible Markov chain on the given network
graph. We construct such a Markov chain by transforming the standard Markov
chain, which is obtained using the Metropolis-Hastings method. We call this
novel transformation pseudo-lifting. We apply our method to graphs with
geometry, or graphs with doubling dimension. Specifically, the convergence time
of our algorithm (equivalently, the mixing time of our Markov chain) is
proportional to the diameter of the network graph and hence optimal. As a
byproduct, our result provides the fastest mixing Markov chain given the
network topological constraints, and should naturally find their applications
in the context of distributed optimization, estimation and control
Gossip Algorithms for Distributed Signal Processing
Gossip algorithms are attractive for in-network processing in sensor networks
because they do not require any specialized routing, there is no bottleneck or
single point of failure, and they are robust to unreliable wireless network
conditions. Recently, there has been a surge of activity in the computer
science, control, signal processing, and information theory communities,
developing faster and more robust gossip algorithms and deriving theoretical
performance guarantees. This article presents an overview of recent work in the
area. We describe convergence rate results, which are related to the number of
transmitted messages and thus the amount of energy consumed in the network for
gossiping. We discuss issues related to gossiping over wireless links,
including the effects of quantization and noise, and we illustrate the use of
gossip algorithms for canonical signal processing tasks including distributed
estimation, source localization, and compression.Comment: Submitted to Proceedings of the IEEE, 29 page
- …