57,263 research outputs found
Similarity, Compression and Local Steps: Three Pillars of Efficient Communications for Distributed Variational Inequalities
Variational inequalities are a broad and flexible class of problems that
includes minimization, saddle point, fixed point problems as special cases.
Therefore, variational inequalities are used in a variety of applications
ranging from equilibrium search to adversarial learning. Today's realities with
the increasing size of data and models demand parallel and distributed
computing for real-world machine learning problems, most of which can be
represented as variational inequalities. Meanwhile, most distributed approaches
has a significant bottleneck - the cost of communications. The three main
techniques to reduce both the total number of communication rounds and the cost
of one such round are the use of similarity of local functions, compression of
transmitted information and local updates. In this paper, we combine all these
approaches. Such a triple synergy did not exist before for variational
inequalities and saddle problems, nor even for minimization problems. The
methods presented in this paper have the best theoretical guarantees of
communication complexity and are significantly ahead of other methods for
distributed variational inequalities. The theoretical results are confirmed by
adversarial learning experiments on synthetic and real datasets.Comment: 19 pages, 2 algorithms, 1 tabl
Nonlinear Information Bottleneck
Information bottleneck (IB) is a technique for extracting information in one
random variable that is relevant for predicting another random variable
. IB works by encoding in a compressed "bottleneck" random variable
from which can be accurately decoded. However, finding the optimal
bottleneck variable involves a difficult optimization problem, which until
recently has been considered for only two limited cases: discrete and
with small state spaces, and continuous and with a Gaussian joint
distribution (in which case optimal encoding and decoding maps are linear). We
propose a method for performing IB on arbitrarily-distributed discrete and/or
continuous and , while allowing for nonlinear encoding and decoding
maps. Our approach relies on a novel non-parametric upper bound for mutual
information. We describe how to implement our method using neural networks. We
then show that it achieves better performance than the recently-proposed
"variational IB" method on several real-world datasets
SASG: Sparsification with Adaptive Stochastic Gradients for Communication-efficient Distributed Learning
Stochastic optimization algorithms implemented on distributed computing
architectures are increasingly used to tackle large-scale machine learning
applications. A key bottleneck in such distributed systems is the communication
overhead for exchanging information such as stochastic gradients between
different workers. Sparse communication with memory and the adaptive
aggregation methodology are two successful frameworks among the various
techniques proposed to address this issue. In this paper, we creatively exploit
the advantages of Sparse communication and Adaptive aggregated Stochastic
Gradients to design a communication-efficient distributed algorithm named SASG.
Specifically, we first determine the workers that need to communicate based on
the adaptive aggregation rule and then sparse this transmitted information.
Therefore, our algorithm reduces both the overhead of communication rounds and
the number of communication bits in the distributed system. We define an
auxiliary sequence and give convergence results of the algorithm with the help
of Lyapunov function analysis. Experiments on training deep neural networks
show that our algorithm can significantly reduce the number of communication
rounds and bits compared to the previous methods, with little or no impact on
training and testing accuracy.Comment: 12 pages, 5 figure
DRASIC: Distributed Recurrent Autoencoder for Scalable Image Compression
We propose a new architecture for distributed image compression from a group
of distributed data sources. The work is motivated by practical needs of
data-driven codec design, low power consumption, robustness, and data privacy.
The proposed architecture, which we refer to as Distributed Recurrent
Autoencoder for Scalable Image Compression (DRASIC), is able to train
distributed encoders and one joint decoder on correlated data sources. Its
compression capability is much better than the method of training codecs
separately. Meanwhile, the performance of our distributed system with 10
distributed sources is only within 2 dB peak signal-to-noise ratio (PSNR) of
the performance of a single codec trained with all data sources. We experiment
distributed sources with different correlations and show how our data-driven
methodology well matches the Slepian-Wolf Theorem in Distributed Source Coding
(DSC). To the best of our knowledge, this is the first data-driven DSC
framework for general distributed code design with deep learning
- …