44 research outputs found
Recurrent Binary Embedding for GPU-Enabled Exhaustive Retrieval from Billion-Scale Semantic Vectors
Rapid advances in GPU hardware and multiple areas of Deep Learning open up a
new opportunity for billion-scale information retrieval with exhaustive search.
Building on top of the powerful concept of semantic learning, this paper
proposes a Recurrent Binary Embedding (RBE) model that learns compact
representations for real-time retrieval. The model has the unique ability to
refine a base binary vector by progressively adding binary residual vectors to
meet the desired accuracy. The refined vector enables efficient implementation
of exhaustive similarity computation with bit-wise operations, followed by a
near- lossless k-NN selection algorithm, also proposed in this paper. The
proposed algorithms are integrated into an end-to-end multi-GPU system that
retrieves thousands of top items from over a billion candidates in real-time.
The RBE model and the retrieval system were evaluated with data from a major
paid search engine. When measured against the state-of-the-art model for binary
representation and the full precision model for semantic embedding, RBE
significantly outperformed the former, and filled in over 80% of the AUC gap
in-between. Experiments comparing with our production retrieval system also
demonstrated superior performance. While the primary focus of this paper is to
build RBE based on a particular class of semantic models, generalizing to other
types is straightforward, as exemplified by two different models at the end of
the paper.Comment: 15 pages, 5 figures, 6 table
Can a Fruit Fly Learn Word Embeddings?
The mushroom body of the fruit fly brain is one of the best studied systems
in neuroscience. At its core it consists of a population of Kenyon cells, which
receive inputs from multiple sensory modalities. These cells are inhibited by
the anterior paired lateral neuron, thus creating a sparse high dimensional
representation of the inputs. In this work we study a mathematical
formalization of this network motif and apply it to learning the correlational
structure between words and their context in a corpus of unstructured text, a
common natural language processing (NLP) task. We show that this network can
learn semantic representations of words and can generate both static and
context-dependent word embeddings. Unlike conventional methods (e.g., BERT,
GloVe) that use dense representations for word embedding, our algorithm encodes
semantic meaning of words and their context in the form of sparse binary hash
codes. The quality of the learned representations is evaluated on word
similarity analysis, word-sense disambiguation, and document classification. It
is shown that not only can the fruit fly network motif achieve performance
comparable to existing methods in NLP, but, additionally, it uses only a
fraction of the computational resources (shorter training time and smaller
memory footprint).Comment: Accepted for publication at ICLR 202
Hamming Sentence Embeddings for Information Retrieval
In retrieval applications, binary hashes are known to offer significant
improvements in terms of both memory and speed. We investigate the compression
of sentence embeddings using a neural encoder-decoder architecture, which is
trained by minimizing reconstruction error. Instead of employing the original
real-valued embeddings, we use latent representations in Hamming space produced
by the encoder for similarity calculations.
In quantitative experiments on several benchmarks for semantic similarity
tasks, we show that our compressed hamming embeddings yield a comparable
performance to uncompressed embeddings (Sent2Vec, InferSent, Glove-BoW), at
compression ratios of up to 256:1. We further demonstrate that our model
strongly decorrelates input features, and that the compressor generalizes well
when pre-trained on Wikipedia sentences. We publish the source code on Github
and all experimental results.Comment: 4 Pages, 9 Figures, 1 Tabl
Learning Compressed Sentence Representations for On-Device Text Processing
Vector representations of sentences, trained on massive text corpora, are
widely used as generic sentence embeddings across a variety of NLP problems.
The learned representations are generally assumed to be continuous and
real-valued, giving rise to a large memory footprint and slow retrieval speed,
which hinders their applicability to low-resource (memory and computation)
platforms, such as mobile devices. In this paper, we propose four different
strategies to transform continuous and generic sentence embeddings into a
binarized form, while preserving their rich semantic information. The
introduced methods are evaluated across a wide range of downstream tasks, where
the binarized sentence embeddings are demonstrated to degrade performance by
only about 2% relative to their continuous counterparts, while reducing the
storage requirement by over 98%. Moreover, with the learned binary
representations, the semantic relatedness of two sentences can be evaluated by
simply calculating their Hamming distance, which is more computational
efficient compared with the inner product operation between continuous
embeddings. Detailed analysis and case study further validate the effectiveness
of proposed methods.Comment: To appear at ACL 201
Extreme Model Compression for On-device Natural Language Understanding
In this paper, we propose and experiment with techniques for extreme
compression of neural natural language understanding (NLU) models, making them
suitable for execution on resource-constrained devices. We propose a
task-aware, end-to-end compression approach that performs word-embedding
compression jointly with NLU task learning. We show our results on a
large-scale, commercial NLU system trained on a varied set of intents with huge
vocabulary sizes. Our approach outperforms a range of baselines and achieves a
compression rate of 97.4% with less than 3.7% degradation in predictive
performance. Our analysis indicates that the signal from the downstream task is
important for effective compression with minimal degradation in performance.Comment: Long paper at COLING 202
Ultra-High Dimensional Sparse Representations with Binarization for Efficient Text Retrieval
The semantic matching capabilities of neural information retrieval can
ameliorate synonymy and polysemy problems of symbolic approaches. However,
neural models' dense representations are more suitable for re-ranking, due to
their inefficiency. Sparse representations, either in symbolic or latent form,
are more efficient with an inverted index. Taking the merits of the sparse and
dense representations, we propose an ultra-high dimensional (UHD)
representation scheme equipped with directly controllable sparsity. UHD's large
capacity and minimal noise and interference among the dimensions allow for
binarized representations, which are highly efficient for storage and search.
Also proposed is a bucketing method, where the embeddings from multiple layers
of BERT are selected/merged to represent diverse linguistic aspects. We test
our models with MS MARCO and TREC CAR, showing that our models outperforms
other sparse modelsComment: To appear at EMNLP 202
Supervised Understanding of Word Embeddings
Pre-trained word embeddings are widely used for transfer learning in natural
language processing. The embeddings are continuous and distributed
representations of the words that preserve their similarities in compact
Euclidean spaces. However, the dimensions of these spaces do not provide any
clear interpretation. In this study, we have obtained supervised projections in
the form of the linear keyword-level classifiers on word embeddings. We have
shown that the method creates interpretable projections of original embedding
dimensions. Activations of the trained classifier nodes correspond to a subset
of the words in the vocabulary. Thus, they behave similarly to the dictionary
features while having the merit of continuous value output. Additionally, such
dictionaries can be grown iteratively with multiple rounds by adding expert
labels on top-scoring words to an initial collection of the keywords. Also, the
same classifiers can be applied to aligned word embeddings in other languages
to obtain corresponding dictionaries. In our experiments, we have shown that
initializing higher-order networks with these classifier weights gives more
accurate models for downstream NLP tasks. We further demonstrate the usefulness
of supervised dimensions in revealing the polysemous nature of a keyword of
interest by projecting it's embedding using learned classifiers in different
sub-spaces
Sparse associative memory based on contextual code learning for disambiguating word senses
In recent literature, contextual pretrained Language Models (LMs)
demonstrated their potential in generalizing the knowledge to several Natural
Language Processing (NLP) tasks including supervised Word Sense Disambiguation
(WSD), a challenging problem in the field of Natural Language Understanding
(NLU). However, word representations from these models are still very dense,
costly in terms of memory footprint, as well as minimally interpretable. In
order to address such issues, we propose a new supervised biologically inspired
technique for transferring large pre-trained language model representations
into a compressed representation, for the case of WSD. Our produced
representation contributes to increase the general interpretability of the
framework and to decrease memory footprint, while enhancing performance
Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems
Embedding representations power machine intelligence in many applications,
including recommendation systems, but they are space intensive -- potentially
occupying hundreds of gigabytes in large-scale settings. To help manage this
outsized memory consumption, we explore mixed dimension embeddings, an
embedding layer architecture in which a particular embedding vector's dimension
scales with its query frequency. Through theoretical analysis and systematic
experiments, we demonstrate that using mixed dimensions can drastically reduce
the memory usage, while maintaining and even improving the ML performance.
Empirically, we show that the proposed mixed dimension layers improve accuracy
by 0.1% using half as many parameters or maintain it using 16X fewer parameters
for click-through rate prediction task on the Criteo Kaggle dataset
General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference
The state of the art on many NLP tasks is currently achieved by large
pre-trained language models, which require a considerable amount of
computation. We explore a setting where many different predictions are made on
a single piece of text. In that case, some of the computational cost during
inference can be amortized over the different tasks using a shared text
encoder. We compare approaches for training such an encoder and show that
encoders pre-trained over multiple tasks generalize well to unseen tasks. We
also compare ways of extracting fixed- and limited-size representations from
this encoder, including different ways of pooling features extracted from
multiple layers or positions. Our best approach compares favorably to knowledge
distillation, achieving higher accuracy and lower computational cost once the
system is handling around 7 tasks. Further, we show that through binary
quantization, we can reduce the size of the extracted representations by a
factor of 16 making it feasible to store them for later use. The resulting
method offers a compelling solution for using large-scale pre-trained models at
a fraction of the computational cost when multiple tasks are performed on the
same text