54,378 research outputs found
Natural Compression for Distributed Deep Learning
Modern deep learning models are often trained in parallel over a collection
of distributed machines to reduce training time. In such settings,
communication of model updates among machines becomes a significant performance
bottleneck and various lossy update compression techniques have been proposed
to alleviate this problem. In this work, we introduce a new, simple yet
theoretically and practically effective compression technique: {\em natural
compression (NC)}. Our technique is applied individually to all entries of the
to-be-compressed update vector and works by randomized rounding to the nearest
(negative or positive) power of two, which can be computed in a "natural" way
by ignoring the mantissa. We show that compared to no compression, NC increases
the second moment of the compressed vector by not more than the tiny factor
\nicefrac{9}{8}, which means that the effect of NC on the convergence speed
of popular training algorithms, such as distributed SGD, is negligible.
However, the communications savings enabled by NC are substantial, leading to
{\em - improvement in overall theoretical running time}. For
applications requiring more aggressive compression, we generalize NC to {\em
natural dithering}, which we prove is {\em exponentially better} than the
common random dithering technique. Our compression operators can be used on
their own or in combination with existing operators for a more aggressive
combined effect, and offer new state-of-the-art both in theory and practice.Comment: 8 pages, 20 pages of Appendix, 6 Tables, 14 Figure
DRASIC: Distributed Recurrent Autoencoder for Scalable Image Compression
We propose a new architecture for distributed image compression from a group
of distributed data sources. The work is motivated by practical needs of
data-driven codec design, low power consumption, robustness, and data privacy.
The proposed architecture, which we refer to as Distributed Recurrent
Autoencoder for Scalable Image Compression (DRASIC), is able to train
distributed encoders and one joint decoder on correlated data sources. Its
compression capability is much better than the method of training codecs
separately. Meanwhile, the performance of our distributed system with 10
distributed sources is only within 2 dB peak signal-to-noise ratio (PSNR) of
the performance of a single codec trained with all data sources. We experiment
distributed sources with different correlations and show how our data-driven
methodology well matches the Slepian-Wolf Theorem in Distributed Source Coding
(DSC). To the best of our knowledge, this is the first data-driven DSC
framework for general distributed code design with deep learning
- …