1,474 research outputs found
Online Embedding Compression for Text Classification using Low Rank Matrix Factorization
Deep learning models have become state of the art for natural language
processing (NLP) tasks, however deploying these models in production system
poses significant memory constraints. Existing compression methods are either
lossy or introduce significant latency. We propose a compression method that
leverages low rank matrix factorization during training,to compress the word
embedding layer which represents the size bottleneck for most NLP models. Our
models are trained, compressed and then further re-trained on the downstream
task to recover accuracy while maintaining the reduced size. Empirically, we
show that the proposed method can achieve 90% compression with minimal impact
in accuracy for sentence classification tasks, and outperforms alternative
methods like fixed-point quantization or offline word embedding compression. We
also analyze the inference time and storage space for our method through FLOP
calculations, showing that we can compress DNN models by a configurable ratio
and regain accuracy loss without introducing additional latency compared to
fixed point quantization. Finally, we introduce a novel learning rate schedule,
the Cyclically Annealed Learning Rate (CALR), which we empirically demonstrate
to outperform other popular adaptive learning rate algorithms on a sentence
classification benchmark.Comment: Accepted in Thirty-Third AAAI Conference on Artificial Intelligence
(AAAI 2019
Learning K-way D-dimensional Discrete Codes for Compact Embedding Representations
Conventional embedding methods directly associate each symbol with a
continuous embedding vector, which is equivalent to applying a linear
transformation based on a "one-hot" encoding of the discrete symbols. Despite
its simplicity, such approach yields the number of parameters that grows
linearly with the vocabulary size and can lead to overfitting. In this work, we
propose a much more compact K-way D-dimensional discrete encoding scheme to
replace the "one-hot" encoding. In the proposed "KD encoding", each symbol is
represented by a -dimensional code with a cardinality of , and the final
symbol embedding vector is generated by composing the code embedding vectors.
To end-to-end learn semantically meaningful codes, we derive a relaxed discrete
optimization approach based on stochastic gradient descent, which can be
generally applied to any differentiable computational graph with an embedding
layer. In our experiments with various applications from natural language
processing to graph convolutional networks, the total size of the embedding
layer can be reduced up to 98\% while achieving similar or better performance.Comment: ICML 2018. arXiv admin note: text overlap with arXiv:1711.0306
Tensorized Embedding Layers for Efficient Model Compression
The embedding layers transforming input words into real vectors are the key
components of deep neural networks used in natural language processing.
However, when the vocabulary is large, the corresponding weight matrices can be
enormous, which precludes their deployment in a limited resource setting. We
introduce a novel way of parametrizing embedding layers based on the Tensor
Train (TT) decomposition, which allows compressing the model significantly at
the cost of a negligible drop or even a slight gain in performance. We evaluate
our method on a wide range of benchmarks in natural language processing and
analyze the trade-off between performance and compression ratios for a wide
range of architectures, from MLPs to LSTMs and Transformers
Compression of Recurrent Neural Networks for Efficient Language Modeling
Recurrent neural networks have proved to be an effective method for
statistical language modeling. However, in practice their memory and run-time
complexity are usually too large to be implemented in real-time offline mobile
applications. In this paper we consider several compression techniques for
recurrent neural networks including Long-Short Term Memory models. We make
particular attention to the high-dimensional output problem caused by the very
large vocabulary size. We focus on effective compression methods in the context
of their exploitation on devices: pruning, quantization, and matrix
decomposition approaches (low-rank factorization and tensor train
decomposition, in particular). For each model we investigate the trade-off
between its size, suitability for fast inference and perplexity. We propose a
general pipeline for applying the most suitable methods to compress recurrent
neural networks for language modeling. It has been shown in the experimental
study with the Penn Treebank (PTB) dataset that the most efficient results in
terms of speed and compression-perplexity balance are obtained by matrix
decomposition techniques.Comment: 25 pages, 3 tables, 4 figure
Graph Embedding Techniques, Applications, and Performance: A Survey
Graphs, such as social networks, word co-occurrence networks, and
communication networks, occur naturally in various real-world applications.
Analyzing them yields insight into the structure of society, language, and
different patterns of communication. Many approaches have been proposed to
perform the analysis. Recently, methods which use the representation of graph
nodes in vector space have gained traction from the research community. In this
survey, we provide a comprehensive and structured analysis of various graph
embedding techniques proposed in the literature. We first introduce the
embedding task and its challenges such as scalability, choice of
dimensionality, and features to be preserved, and their possible solutions. We
then present three categories of approaches based on factorization methods,
random walks, and deep learning, with examples of representative algorithms in
each category and analysis of their performance on various tasks. We evaluate
these state-of-the-art methods on a few common datasets and compare their
performance against one another. Our analysis concludes by suggesting some
potential applications and future directions. We finally present the
open-source Python library we developed, named GEM (Graph Embedding Methods,
available at https://github.com/palash1992/GEM), which provides all presented
algorithms within a unified interface to foster and facilitate research on the
topic.Comment: Submitted to Knowledge Based Systems for revie
LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression
BERT is a cutting-edge language representation model pre-trained by a large
corpus, which achieves superior performances on various natural language
understanding tasks. However, a major blocking issue of applying BERT to online
services is that it is memory-intensive and leads to unsatisfactory latency of
user requests, raising the necessity of model compression. Existing solutions
leverage the knowledge distillation framework to learn a smaller model that
imitates the behaviors of BERT. However, the training procedure of knowledge
distillation is expensive itself as it requires sufficient training data to
imitate the teacher model. In this paper, we address this issue by proposing a
hybrid solution named LadaBERT (Lightweight adaptation of BERT through hybrid
model compression), which combines the advantages of different model
compression methods, including weight pruning, matrix factorization and
knowledge distillation. LadaBERT achieves state-of-the-art accuracy on various
public datasets while the training overheads can be reduced by an order of
magnitude
Spectral Network Embedding: A Fast and Scalable Method via Sparsity
Network embedding aims to learn low-dimensional representations of nodes in a
network, while the network structure and inherent properties are preserved. It
has attracted tremendous attention recently due to significant progress in
downstream network learning tasks, such as node classification, link
prediction, and visualization. However, most existing network embedding methods
suffer from the expensive computations due to the large volume of networks. In
this paper, we propose a faster network embedding
method, called Progle, by elegantly utilizing the sparsity property of online
networks and spectral analysis. In Progle, we first construct a \textit{sparse}
proximity matrix and train the network embedding efficiently via sparse matrix
decomposition. Then we introduce a network propagation pattern via spectral
analysis to incorporate local and global structure information into the
embedding. Besides, this model can be generalized to integrate network
information into other insufficiently trained embeddings at speed. Benefiting
from sparse spectral network embedding, our experiment on four different
datasets shows that Progle outperforms or is comparable to state-of-the-art
unsupervised comparison approaches---DeepWalk, LINE, node2vec, GraRep, and
HOPE, regarding accuracy, while is faster than the fastest
word2vec-based method. Finally, we validate the scalability of Progle both in
real large-scale networks and multiple scales of synthetic networks
Improving Word Embedding Factorization for Compression Using Distilled Nonlinear Neural Decomposition
Word-embeddings are vital components of Natural Language Processing (NLP)
models and have been extensively explored. However, they consume a lot of
memory which poses a challenge for edge deployment. Embedding matrices,
typically, contain most of the parameters for language models and about a third
for machine translation systems. In this paper, we propose Distilled Embedding,
an (input/output) embedding compression method based on low-rank matrix
decomposition and knowledge distillation. First, we initialize the weights of
our decomposed matrices by learning to reconstruct the full pre-trained
word-embedding and then fine-tune end-to-end, employing knowledge distillation
on the factorized embedding. We conduct extensive experiments with various
compression rates on machine translation and language modeling, using different
data-sets with a shared word-embedding matrix for both embedding and vocabulary
projection matrices. We show that the proposed technique is simple to
replicate, with one fixed parameter controlling compression size, has higher
BLEU score on translation and lower perplexity on language modeling compared to
complex, difficult to tune state-of-the-art methods.Comment: Accepted at Findings of EMNLP 202
DeepFont: Identify Your Font from An Image
As font is one of the core design concepts, automatic font identification and
similar font suggestion from an image or photo has been on the wish list of
many designers. We study the Visual Font Recognition (VFR) problem, and advance
the state-of-the-art remarkably by developing the DeepFont system. First of
all, we build up the first available large-scale VFR dataset, named AdobeVFR,
consisting of both labeled synthetic data and partially labeled real-world
data. Next, to combat the domain mismatch between available training and
testing data, we introduce a Convolutional Neural Network (CNN) decomposition
approach, using a domain adaptation technique based on a Stacked Convolutional
Auto-Encoder (SCAE) that exploits a large corpus of unlabeled real-world text
images combined with synthetic data preprocessed in a specific way. Moreover,
we study a novel learning-based model compression approach, in order to reduce
the DeepFont model size without sacrificing its performance. The DeepFont
system achieves an accuracy of higher than 80% (top-5) on our collected
dataset, and also produces a good font similarity measure for font selection
and suggestion. We also achieve around 6 times compression of the model without
any visible loss of recognition accuracy.Comment: To Appear in ACM Multimedia as a full pape
Weight Squeezing: Reparameterization for Extreme Compression and Fast Inference
In this work, we present a novel approach for simultaneous knowledge transfer
and model compression called Weight Squeezing. With this method, we perform
knowledge transfer from a pre-trained teacher model by learning the mapping
from its weights to smaller student model weights, without significant loss of
model accuracy.
We applied Weight Squeezing to a pre-trained text classification model and
compared our method to various other knowledge transfer and model compression
methods on several downstream text classification tasks based on the GLUE
dataset. We observed that our approach produces better results than other
methods for training student models without any loss in inference speed. We
also compared Weight Squeezing with Low-Rank Factorization approach and
observed that our method is significantly faster at inference while being
competitive in terms of accuracy
- …