57,263 research outputs found

    Similarity, Compression and Local Steps: Three Pillars of Efficient Communications for Distributed Variational Inequalities

    Full text link
    Variational inequalities are a broad and flexible class of problems that includes minimization, saddle point, fixed point problems as special cases. Therefore, variational inequalities are used in a variety of applications ranging from equilibrium search to adversarial learning. Today's realities with the increasing size of data and models demand parallel and distributed computing for real-world machine learning problems, most of which can be represented as variational inequalities. Meanwhile, most distributed approaches has a significant bottleneck - the cost of communications. The three main techniques to reduce both the total number of communication rounds and the cost of one such round are the use of similarity of local functions, compression of transmitted information and local updates. In this paper, we combine all these approaches. Such a triple synergy did not exist before for variational inequalities and saddle problems, nor even for minimization problems. The methods presented in this paper have the best theoretical guarantees of communication complexity and are significantly ahead of other methods for distributed variational inequalities. The theoretical results are confirmed by adversarial learning experiments on synthetic and real datasets.Comment: 19 pages, 2 algorithms, 1 tabl

    Nonlinear Information Bottleneck

    Full text link
    Information bottleneck (IB) is a technique for extracting information in one random variable XX that is relevant for predicting another random variable YY. IB works by encoding XX in a compressed "bottleneck" random variable MM from which YY can be accurately decoded. However, finding the optimal bottleneck variable involves a difficult optimization problem, which until recently has been considered for only two limited cases: discrete XX and YY with small state spaces, and continuous XX and YY with a Gaussian joint distribution (in which case optimal encoding and decoding maps are linear). We propose a method for performing IB on arbitrarily-distributed discrete and/or continuous XX and YY, while allowing for nonlinear encoding and decoding maps. Our approach relies on a novel non-parametric upper bound for mutual information. We describe how to implement our method using neural networks. We then show that it achieves better performance than the recently-proposed "variational IB" method on several real-world datasets

    SASG: Sparsification with Adaptive Stochastic Gradients for Communication-efficient Distributed Learning

    Full text link
    Stochastic optimization algorithms implemented on distributed computing architectures are increasingly used to tackle large-scale machine learning applications. A key bottleneck in such distributed systems is the communication overhead for exchanging information such as stochastic gradients between different workers. Sparse communication with memory and the adaptive aggregation methodology are two successful frameworks among the various techniques proposed to address this issue. In this paper, we creatively exploit the advantages of Sparse communication and Adaptive aggregated Stochastic Gradients to design a communication-efficient distributed algorithm named SASG. Specifically, we first determine the workers that need to communicate based on the adaptive aggregation rule and then sparse this transmitted information. Therefore, our algorithm reduces both the overhead of communication rounds and the number of communication bits in the distributed system. We define an auxiliary sequence and give convergence results of the algorithm with the help of Lyapunov function analysis. Experiments on training deep neural networks show that our algorithm can significantly reduce the number of communication rounds and bits compared to the previous methods, with little or no impact on training and testing accuracy.Comment: 12 pages, 5 figure

    DRASIC: Distributed Recurrent Autoencoder for Scalable Image Compression

    Full text link
    We propose a new architecture for distributed image compression from a group of distributed data sources. The work is motivated by practical needs of data-driven codec design, low power consumption, robustness, and data privacy. The proposed architecture, which we refer to as Distributed Recurrent Autoencoder for Scalable Image Compression (DRASIC), is able to train distributed encoders and one joint decoder on correlated data sources. Its compression capability is much better than the method of training codecs separately. Meanwhile, the performance of our distributed system with 10 distributed sources is only within 2 dB peak signal-to-noise ratio (PSNR) of the performance of a single codec trained with all data sources. We experiment distributed sources with different correlations and show how our data-driven methodology well matches the Slepian-Wolf Theorem in Distributed Source Coding (DSC). To the best of our knowledge, this is the first data-driven DSC framework for general distributed code design with deep learning
    • …
    corecore