2,679 research outputs found
CGX: Adaptive System Support for Communication-Efficient Deep Learning
The ability to scale out training workloads has been one of the key
performance enablers of deep learning. The main scaling approach is
data-parallel GPU-based training, which has been boosted by hardware and
software support for highly efficient point-to-point communication, and in
particular via hardware bandwidth overprovisioning. Overprovisioning comes at a
cost: there is an order of magnitude price difference between "cloud-grade"
servers with such support, relative to their popular "consumer-grade"
counterparts, although single server-grade and consumer-grade GPUs can have
similar computational envelopes.
In this paper, we show that the costly hardware overprovisioning approach can
be supplanted via algorithmic and system design, and propose a framework called
CGX, which provides efficient software support for compressed communication in
ML applications, for both multi-GPU single-node training, as well as
larger-scale multi-node training. CGX is based on two technical advances:
\emph{At the system level}, it relies on a re-developed communication stack for
ML frameworks, which provides flexible, highly-efficient support for compressed
communication. \emph{At the application level}, it provides \emph{seamless,
parameter-free} integration with popular frameworks, so that end-users do not
have to modify training recipes, nor significant training code. This is
complemented by a \emph{layer-wise adaptive compression} technique which
dynamically balances compression gains with accuracy preservation. CGX
integrates with popular ML frameworks, providing up to 3X speedups for
multi-GPU nodes based on commodity hardware, and order-of-magnitude
improvements in the multi-node setting, with negligible impact on accuracy
Error Feedback Can Accurately Compress Preconditioners
Leveraging second-order information at the scale of deep networks is one of
the main lines of approach for improving the performance of current optimizers
for deep learning. Yet, existing approaches for accurate full-matrix
preconditioning, such as Full-Matrix Adagrad (GGT) or Matrix-Free Approximate
Curvature (M-FAC) suffer from massive storage costs when applied even to
medium-scale models, as they must store a sliding window of gradients, whose
memory requirements are multiplicative in the model dimension. In this paper,
we address this issue via an efficient and simple-to-implement error-feedback
technique that can be applied to compress preconditioners by up to two orders
of magnitude in practice, without loss of convergence. Specifically, our
approach compresses the gradient information via sparsification or low-rank
compression \emph{before} it is fed into the preconditioner, feeding the
compression error back into future iterations. Extensive experiments on deep
neural networks for vision show that this approach can compress full-matrix
preconditioners by up to two orders of magnitude without impact on accuracy,
effectively removing the memory overhead of full-matrix preconditioning for
implementations of full-matrix Adagrad (GGT) and natural gradient (M-FAC). Our
code is available at https://github.com/IST-DASLab/EFCP
Pre-RMSNorm and Pre-CRMSNorm Transformers: Equivalent and Efficient Pre-LN Transformers
Transformers have achieved great success in machine learning applications.
Normalization techniques, such as Layer Normalization (LayerNorm, LN) and Root
Mean Square Normalization (RMSNorm), play a critical role in accelerating and
stabilizing the training of Transformers. While LayerNorm recenters and
rescales input vectors, RMSNorm only rescales the vectors by their RMS value.
Despite being more computationally efficient, RMSNorm may compromise the
representation ability of Transformers. There is currently no consensus
regarding the preferred normalization technique, as some models employ
LayerNorm while others utilize RMSNorm, especially in recent large language
models. It is challenging to convert Transformers with one normalization to the
other type. While there is an ongoing disagreement between the two
normalization types, we propose a solution to unify two mainstream Transformer
architectures, Pre-LN and Pre-RMSNorm Transformers. By removing the inherent
redundant mean information in the main branch of Pre-LN Transformers, we can
reduce LayerNorm to RMSNorm, achieving higher efficiency. We further propose
the Compressed RMSNorm (CRMSNorm) and Pre-CRMSNorm Transformer based on a
lossless compression of the zero-mean vectors. We formally establish the
equivalence of Pre-LN, Pre-RMSNorm, and Pre-CRMSNorm Transformer variants in
both training and inference. It implies that Pre-LN Transformers can be
substituted with Pre-(C)RMSNorm counterparts at almost no cost, offering the
same arithmetic functionality along with free efficiency improvement.
Experiments demonstrate that we can reduce the training and inference time of
Pre-LN Transformers by up to 10%.Comment: 15 pages, 5 tables, code available at
https://github.com/ZixuanJiang/pre-rmsnorm-transforme
PA-iMFL: Communication-Efficient Privacy Amplification Method against Data Reconstruction Attack in Improved Multi-Layer Federated Learning
Recently, big data has seen explosive growth in the Internet of Things (IoT).
Multi-layer FL (MFL) based on cloud-edge-end architecture can promote model
training efficiency and model accuracy while preserving IoT data privacy. This
paper considers an improved MFL, where edge layer devices own private data and
can join the training process. iMFL can improve edge resource utilization and
also alleviate the strict requirement of end devices, but suffers from the
issues of Data Reconstruction Attack (DRA) and unacceptable communication
overhead. This paper aims to address these issues with iMFL. We propose a
Privacy Amplification scheme on iMFL (PA-iMFL). Differing from standard MFL, we
design privacy operations in end and edge devices after local training,
including three sequential components, local differential privacy with Laplace
mechanism, privacy amplification subsample, and gradient sign reset.
Benefitting from privacy operations, PA-iMFL reduces communication overhead and
achieves privacy-preserving. Extensive results demonstrate that against
State-Of-The-Art (SOTA) DRAs, PA-iMFL can effectively mitigate private data
leakage and reach the same level of protection capability as the SOTA defense
model. Moreover, due to adopting privacy operations in edge devices, PA-iMFL
promotes up to 2.8 times communication efficiency than the SOTA compression
method without compromising model accuracy.Comment: 12 pages, 11 figure
On the Inability of Markov Models to Capture Criticality in Human Mobility
We examine the non-Markovian nature of human mobility by exposing the
inability of Markov models to capture criticality in human mobility. In
particular, the assumed Markovian nature of mobility was used to establish a
theoretical upper bound on the predictability of human mobility (expressed as a
minimum error probability limit), based on temporally correlated entropy. Since
its inception, this bound has been widely used and empirically validated using
Markov chains. We show that recurrent-neural architectures can achieve
significantly higher predictability, surpassing this widely used upper bound.
In order to explain this anomaly, we shed light on several underlying
assumptions in previous research works that has resulted in this bias. By
evaluating the mobility predictability on real-world datasets, we show that
human mobility exhibits scale-invariant long-range correlations, bearing
similarity to a power-law decay. This is in contrast to the initial assumption
that human mobility follows an exponential decay. This assumption of
exponential decay coupled with Lempel-Ziv compression in computing Fano's
inequality has led to an inaccurate estimation of the predictability upper
bound. We show that this approach inflates the entropy, consequently lowering
the upper bound on human mobility predictability. We finally highlight that
this approach tends to overlook long-range correlations in human mobility. This
explains why recurrent-neural architectures that are designed to handle
long-range structural correlations surpass the previously computed upper bound
on mobility predictability
Lossless compression with latent variable models
We develop a simple and elegant method for lossless compression using latent variable models, which we call `bits back with asymmetric numeral systems' (BB-ANS). The method involves interleaving encode and decode steps, and achieves an optimal rate when compressing batches of data. We demonstrate it rstly on the MNIST test set, showing that state-of-the-art lossless compression is possible using a small variational autoencoder (VAE) model. We then make use of a novel empirical insight, that fully convolutional generative models, trained on small images, are able to generalize to images of arbitrary size, and extend BB-ANS to hierarchical latent variable models, enabling state-of-the-art lossless compression of full-size colour images from the ImageNet dataset. We describe `Craystack', a modular software framework which we have developed for rapid prototyping of compression using deep generative models
Cognition-Based Networks: A New Perspective on Network Optimization Using Learning and Distributed Intelligence
IEEE Access
Volume 3, 2015, Article number 7217798, Pages 1512-1530
Open Access
Cognition-based networks: A new perspective on network optimization using learning and distributed intelligence (Article)
Zorzi, M.a , Zanella, A.a, Testolin, A.b, De Filippo De Grazia, M.b, Zorzi, M.bc
a Department of Information Engineering, University of Padua, Padua, Italy
b Department of General Psychology, University of Padua, Padua, Italy
c IRCCS San Camillo Foundation, Venice-Lido, Italy
View additional affiliations
View references (107)
Abstract
In response to the new challenges in the design and operation of communication networks, and taking inspiration from how living beings deal with complexity and scalability, in this paper we introduce an innovative system concept called COgnition-BAsed NETworkS (COBANETS). The proposed approach develops around the systematic application of advanced machine learning techniques and, in particular, unsupervised deep learning and probabilistic generative models for system-wide learning, modeling, optimization, and data representation. Moreover, in COBANETS, we propose to combine this learning architecture with the emerging network virtualization paradigms, which make it possible to actuate automatic optimization and reconfiguration strategies at the system level, thus fully unleashing the potential of the learning approach. Compared with the past and current research efforts in this area, the technical approach outlined in this paper is deeply interdisciplinary and more comprehensive, calling for the synergic combination of expertise of computer scientists, communications and networking engineers, and cognitive scientists, with the ultimate aim of breaking new ground through a profound rethinking of how the modern understanding of cognition can be used in the management and optimization of telecommunication network
- …