2,316 research outputs found
Bidirectional compression in heterogeneous settings for distributed or federated learning with partial participation: tight convergence guarantees
We introduce a framework - Artemis - to tackle the problem of learning in a
distributed or federated setting with communication constraints and device
partial Several workers (randomly sampled) perform the optimization process
using a central server to aggregate their computations. To alleviate the
communication cost, Artemis allows to compresses the information sent in both
directions (from the workers to the server and conversely) combined with a
memory It improves on existing algorithms that only consider unidirectional
compression (to the server), or use very strong assumptions on the compression
operator, and often do not take into account devices partial participation. We
provide fast rates of convergence (linear up to a threshold) under weak
assumptions on the stochastic gradients (noise's variance bounded only at
optimal point) in non-i.i.d. setting, highlight the impact of memory for
unidirectional and bidirectional compression, analyze Polyak-Ruppert averaging.
We use convergence in distribution to obtain a lower bound of the asymptotic
variance that highlights practical limits of compression. And we provide
experimental results to demonstrate the validity of our analysis.Comment: 56 pages, 4 theorems, 1 algorithm, code source on GitHu
Metric based up-scaling
We consider divergence form elliptic operators in dimension with
coefficients. Although solutions of these operators are only
H\"{o}lder continuous, we show that they are differentiable ()
with respect to harmonic coordinates. It follows that numerical homogenization
can be extended to situations where the medium has no ergodicity at small
scales and is characterized by a continuum of scales by transferring a new
metric in addition to traditional averaged (homogenized) quantities from
subgrid scales into computational scales and error bounds can be given. This
numerical homogenization method can also be used as a compression tool for
differential operators.Comment: Final version. Accepted for publication in Communications on Pure and
Applied Mathematics. Presented at CIMMS (March 2005), Socams 2005 (April),
Oberwolfach, MPI Leipzig (May 2005), CIRM (July 2005). Higher resolution
figures are available at http://www.acm.caltech.edu/~owhadi
EControl: Fast Distributed Optimization with Compression and Error Control
Modern distributed training relies heavily on communication compression to
reduce the communication overhead. In this work, we study algorithms employing
a popular class of contractive compressors in order to reduce communication
overhead. However, the naive implementation often leads to unstable convergence
or even exponential divergence due to the compression bias. Error Compensation
(EC) is an extremely popular mechanism to mitigate the aforementioned issues
during the training of models enhanced by contractive compression operators.
Compared to the effectiveness of EC in the data homogeneous regime, the
understanding of the practicality and theoretical foundations of EC in the data
heterogeneous regime is limited. Existing convergence analyses typically rely
on strong assumptions such as bounded gradients, bounded data heterogeneity, or
large batch accesses, which are often infeasible in modern machine learning
applications. We resolve the majority of current issues by proposing EControl,
a novel mechanism that can regulate error compensation by controlling the
strength of the feedback signal. We prove fast convergence for EControl in
standard strongly convex, general convex, and nonconvex settings without any
additional assumptions on the problem or data heterogeneity. We conduct
extensive numerical evaluations to illustrate the efficacy of our method and
support our theoretical findings
- …