Search CORE

57,263 research outputs found

Similarity, Compression and Local Steps: Three Pillars of Efficient Communications for Distributed Variational Inequalities

Author: Beznosikov Aleksandr
Gasnikov Alexander
Publication venue
Publication date: 15/02/2023
Field of study

Variational inequalities are a broad and flexible class of problems that includes minimization, saddle point, fixed point problems as special cases. Therefore, variational inequalities are used in a variety of applications ranging from equilibrium search to adversarial learning. Today's realities with the increasing size of data and models demand parallel and distributed computing for real-world machine learning problems, most of which can be represented as variational inequalities. Meanwhile, most distributed approaches has a significant bottleneck - the cost of communications. The three main techniques to reduce both the total number of communication rounds and the cost of one such round are the use of similarity of local functions, compression of transmitted information and local updates. In this paper, we combine all these approaches. Such a triple synergy did not exist before for variational inequalities and saddle problems, nor even for minimization problems. The methods presented in this paper have the best theoretical guarantees of communication complexity and are significantly ahead of other methods for distributed variational inequalities. The theoretical results are confirmed by adversarial learning experiments on synthetic and real datasets.Comment: 19 pages, 2 algorithms, 1 tabl

arXiv.org e-Print Archive

Nonlinear Information Bottleneck

Author: Kolchinsky Artemy
Tracey Brendan D.
Wolpert David H.
Publication venue
Publication date: 30/11/2019
Field of study

Information bottleneck (IB) is a technique for extracting information in one random variable

X

that is relevant for predicting another random variable

Y

. IB works by encoding

X

in a compressed "bottleneck" random variable

M

from which

Y

can be accurately decoded. However, finding the optimal bottleneck variable involves a difficult optimization problem, which until recently has been considered for only two limited cases: discrete

X

and

Y

with small state spaces, and continuous

X

and

Y

with a Gaussian joint distribution (in which case optimal encoding and decoding maps are linear). We propose a method for performing IB on arbitrarily-distributed discrete and/or continuous

X

and

Y

, while allowing for nonlinear encoding and decoding maps. Our approach relies on a novel non-parametric upper bound for mutual information. We describe how to implement our method using neural networks. We then show that it achieves better performance than the recently-proposed "variational IB" method on several real-world datasets

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

SASG: Sparsification with Adaptive Stochastic Gradients for Communication-efficient Distributed Learning

Author: Deng Xiaoge
Li Dongsheng
Sun Tao
Publication venue
Publication date: 07/12/2021
Field of study

Stochastic optimization algorithms implemented on distributed computing architectures are increasingly used to tackle large-scale machine learning applications. A key bottleneck in such distributed systems is the communication overhead for exchanging information such as stochastic gradients between different workers. Sparse communication with memory and the adaptive aggregation methodology are two successful frameworks among the various techniques proposed to address this issue. In this paper, we creatively exploit the advantages of Sparse communication and Adaptive aggregated Stochastic Gradients to design a communication-efficient distributed algorithm named SASG. Specifically, we first determine the workers that need to communicate based on the adaptive aggregation rule and then sparse this transmitted information. Therefore, our algorithm reduces both the overhead of communication rounds and the number of communication bits in the distributed system. We define an auxiliary sequence and give convergence results of the algorithm with the help of Lyapunov function analysis. Experiments on training deep neural networks show that our algorithm can significantly reduce the number of communication rounds and bits compared to the previous methods, with little or no impact on training and testing accuracy.Comment: 12 pages, 5 figure

arXiv.org e-Print Archive

DRASIC: Distributed Recurrent Autoencoder for Scalable Image Compression

Author: Diao Enmao
Ding Jie
Tarokh Vahid
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/12/2019
Field of study

We propose a new architecture for distributed image compression from a group of distributed data sources. The work is motivated by practical needs of data-driven codec design, low power consumption, robustness, and data privacy. The proposed architecture, which we refer to as Distributed Recurrent Autoencoder for Scalable Image Compression (DRASIC), is able to train distributed encoders and one joint decoder on correlated data sources. Its compression capability is much better than the method of training codecs separately. Meanwhile, the performance of our distributed system with 10 distributed sources is only within 2 dB peak signal-to-noise ratio (PSNR) of the performance of a single codec trained with all data sources. We experiment distributed sources with different correlations and show how our data-driven methodology well matches the Slepian-Wolf Theorem in Distributed Source Coding (DSC). To the best of our knowledge, this is the first data-driven DSC framework for general distributed code design with deep learning

arXiv.org e-Print Archive

Crossref