513 research outputs found
Compressing Recurrent Neural Networks with Tensor Ring for Action Recognition
Recurrent Neural Networks (RNNs) and their variants, such as Long-Short Term
Memory (LSTM) networks, and Gated Recurrent Unit (GRU) networks, have achieved
promising performance in sequential data modeling. The hidden layers in RNNs
can be regarded as the memory units, which are helpful in storing information
in sequential contexts. However, when dealing with high dimensional input data,
such as video and text, the input-to-hidden linear transformation in RNNs
brings high memory usage and huge computational cost. This makes the training
of RNNs unscalable and difficult. To address this challenge, we propose a novel
compact LSTM model, named as TR-LSTM, by utilizing the low-rank tensor ring
decomposition (TRD) to reformulate the input-to-hidden transformation. Compared
with other tensor decomposition methods, TR-LSTM is more stable. In addition,
TR-LSTM can complete an end-to-end training and also provide a fundamental
building block for RNNs in handling large input data. Experiments on real-world
action recognition datasets have demonstrated the promising performance of the
proposed TR-LSTM compared with the tensor train LSTM and other state-of-the-art
competitors.Comment: 9 page
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training
Highly distributed training of Deep Neural Networks (DNNs) on future compute
platforms (offering 100 of TeraOps/s of computational capacity) is expected to
be severely communication constrained. To overcome this limitation, new
gradient compression techniques are needed that are computationally friendly,
applicable to a wide variety of layers seen in Deep Neural Networks and
adaptable to variations in network architectures as well as their
hyper-parameters. In this paper we introduce a novel technique - the Adaptive
Residual Gradient Compression (AdaComp) scheme. AdaComp is based on localized
selection of gradient residues and automatically tunes the compression rate
depending on local activity. We show excellent results on a wide spectrum of
state of the art Deep Learning models in multiple domains (vision, speech,
language), datasets (MNIST, CIFAR10, ImageNet, BN50, Shakespeare), optimizers
(SGD with momentum, Adam) and network parameters (number of learners,
minibatch-size etc.). Exploiting both sparsity and quantization, we demonstrate
end-to-end compression rates of ~200X for fully-connected and recurrent layers,
and ~40X for convolutional layers, without any noticeable degradation in model
accuracies.Comment: IBM Research AI, 9 pages, 7 figures, AAAI18 accepte
Multi-LSTM Acceleration and CNN Fault Tolerance
This thesis addresses the following two problems related to the field of Machine Learning: the acceleration of multiple Long Short Term Memory (LSTM) models on FPGAs and the fault tolerance of compressed Convolutional Neural Networks (CNN). LSTMs represent an effective solution to capture long-term dependencies in sequential data, like sentences in Natural Language Processing applications, video frames in Scene Labeling tasks or temporal series in Time Series Forecasting. In order to further boost their efficacy, especially in presence of long sequences, multiple LSTM models are utilized in a Hierarchical and Stacked fashion. However, because of their memory-bounded nature, efficient mapping of multiple LSTMs on a computing device becomes even more challenging. The first part of this thesis addresses the problem of mapping multiple LSTM models to a FPGA device by introducing a framework that modifies their memory requirements according to the target architecture. For the similar accuracy loss, the proposed framework maps multiple LSTMs with a performance improvement of 3x to 5x over state-of-the-art approaches. In the second part of this thesis, we investigate the fault tolerance of CNNs, another effective deep learning architecture. CNNs represent a dominating solution in image classification tasks, but suffer from a high performance cost, due to their computational structure. In fact, due to their large parameter space, fetching their data from main memory typically becomes a performance bottleneck. In order to tackle the problem, various techniques for their parameters compression have been developed, such as weight pruning, weight clustering and weight quantization. However, reducing the memory footprint of an application can lead to its data becoming more sensitive to faults. For this thesis work, we have conducted an analysis to verify the conditions for applying OddECC, a mechanism that supports variable strength and size ECCs for different memory regions. Our experiments reveal that compressed CNNs, which have their memory footprint reduced up to 86.3x by utilizing the aforementioned compression schemes, exhibit accuracy drops up to 13.56% in presence of random single bit faults
Online Embedding Compression for Text Classification using Low Rank Matrix Factorization
Deep learning models have become state of the art for natural language
processing (NLP) tasks, however deploying these models in production system
poses significant memory constraints. Existing compression methods are either
lossy or introduce significant latency. We propose a compression method that
leverages low rank matrix factorization during training,to compress the word
embedding layer which represents the size bottleneck for most NLP models. Our
models are trained, compressed and then further re-trained on the downstream
task to recover accuracy while maintaining the reduced size. Empirically, we
show that the proposed method can achieve 90% compression with minimal impact
in accuracy for sentence classification tasks, and outperforms alternative
methods like fixed-point quantization or offline word embedding compression. We
also analyze the inference time and storage space for our method through FLOP
calculations, showing that we can compress DNN models by a configurable ratio
and regain accuracy loss without introducing additional latency compared to
fixed point quantization. Finally, we introduce a novel learning rate schedule,
the Cyclically Annealed Learning Rate (CALR), which we empirically demonstrate
to outperform other popular adaptive learning rate algorithms on a sentence
classification benchmark.Comment: Accepted in Thirty-Third AAAI Conference on Artificial Intelligence
(AAAI 2019
- …