44 research outputs found

    Recurrent Binary Embedding for GPU-Enabled Exhaustive Retrieval from Billion-Scale Semantic Vectors

    Full text link
    Rapid advances in GPU hardware and multiple areas of Deep Learning open up a new opportunity for billion-scale information retrieval with exhaustive search. Building on top of the powerful concept of semantic learning, this paper proposes a Recurrent Binary Embedding (RBE) model that learns compact representations for real-time retrieval. The model has the unique ability to refine a base binary vector by progressively adding binary residual vectors to meet the desired accuracy. The refined vector enables efficient implementation of exhaustive similarity computation with bit-wise operations, followed by a near- lossless k-NN selection algorithm, also proposed in this paper. The proposed algorithms are integrated into an end-to-end multi-GPU system that retrieves thousands of top items from over a billion candidates in real-time. The RBE model and the retrieval system were evaluated with data from a major paid search engine. When measured against the state-of-the-art model for binary representation and the full precision model for semantic embedding, RBE significantly outperformed the former, and filled in over 80% of the AUC gap in-between. Experiments comparing with our production retrieval system also demonstrated superior performance. While the primary focus of this paper is to build RBE based on a particular class of semantic models, generalizing to other types is straightforward, as exemplified by two different models at the end of the paper.Comment: 15 pages, 5 figures, 6 table

    Can a Fruit Fly Learn Word Embeddings?

    Full text link
    The mushroom body of the fruit fly brain is one of the best studied systems in neuroscience. At its core it consists of a population of Kenyon cells, which receive inputs from multiple sensory modalities. These cells are inhibited by the anterior paired lateral neuron, thus creating a sparse high dimensional representation of the inputs. In this work we study a mathematical formalization of this network motif and apply it to learning the correlational structure between words and their context in a corpus of unstructured text, a common natural language processing (NLP) task. We show that this network can learn semantic representations of words and can generate both static and context-dependent word embeddings. Unlike conventional methods (e.g., BERT, GloVe) that use dense representations for word embedding, our algorithm encodes semantic meaning of words and their context in the form of sparse binary hash codes. The quality of the learned representations is evaluated on word similarity analysis, word-sense disambiguation, and document classification. It is shown that not only can the fruit fly network motif achieve performance comparable to existing methods in NLP, but, additionally, it uses only a fraction of the computational resources (shorter training time and smaller memory footprint).Comment: Accepted for publication at ICLR 202

    Hamming Sentence Embeddings for Information Retrieval

    Full text link
    In retrieval applications, binary hashes are known to offer significant improvements in terms of both memory and speed. We investigate the compression of sentence embeddings using a neural encoder-decoder architecture, which is trained by minimizing reconstruction error. Instead of employing the original real-valued embeddings, we use latent representations in Hamming space produced by the encoder for similarity calculations. In quantitative experiments on several benchmarks for semantic similarity tasks, we show that our compressed hamming embeddings yield a comparable performance to uncompressed embeddings (Sent2Vec, InferSent, Glove-BoW), at compression ratios of up to 256:1. We further demonstrate that our model strongly decorrelates input features, and that the compressor generalizes well when pre-trained on Wikipedia sentences. We publish the source code on Github and all experimental results.Comment: 4 Pages, 9 Figures, 1 Tabl

    Learning Compressed Sentence Representations for On-Device Text Processing

    Full text link
    Vector representations of sentences, trained on massive text corpora, are widely used as generic sentence embeddings across a variety of NLP problems. The learned representations are generally assumed to be continuous and real-valued, giving rise to a large memory footprint and slow retrieval speed, which hinders their applicability to low-resource (memory and computation) platforms, such as mobile devices. In this paper, we propose four different strategies to transform continuous and generic sentence embeddings into a binarized form, while preserving their rich semantic information. The introduced methods are evaluated across a wide range of downstream tasks, where the binarized sentence embeddings are demonstrated to degrade performance by only about 2% relative to their continuous counterparts, while reducing the storage requirement by over 98%. Moreover, with the learned binary representations, the semantic relatedness of two sentences can be evaluated by simply calculating their Hamming distance, which is more computational efficient compared with the inner product operation between continuous embeddings. Detailed analysis and case study further validate the effectiveness of proposed methods.Comment: To appear at ACL 201

    Extreme Model Compression for On-device Natural Language Understanding

    Full text link
    In this paper, we propose and experiment with techniques for extreme compression of neural natural language understanding (NLU) models, making them suitable for execution on resource-constrained devices. We propose a task-aware, end-to-end compression approach that performs word-embedding compression jointly with NLU task learning. We show our results on a large-scale, commercial NLU system trained on a varied set of intents with huge vocabulary sizes. Our approach outperforms a range of baselines and achieves a compression rate of 97.4% with less than 3.7% degradation in predictive performance. Our analysis indicates that the signal from the downstream task is important for effective compression with minimal degradation in performance.Comment: Long paper at COLING 202

    Ultra-High Dimensional Sparse Representations with Binarization for Efficient Text Retrieval

    Full text link
    The semantic matching capabilities of neural information retrieval can ameliorate synonymy and polysemy problems of symbolic approaches. However, neural models' dense representations are more suitable for re-ranking, due to their inefficiency. Sparse representations, either in symbolic or latent form, are more efficient with an inverted index. Taking the merits of the sparse and dense representations, we propose an ultra-high dimensional (UHD) representation scheme equipped with directly controllable sparsity. UHD's large capacity and minimal noise and interference among the dimensions allow for binarized representations, which are highly efficient for storage and search. Also proposed is a bucketing method, where the embeddings from multiple layers of BERT are selected/merged to represent diverse linguistic aspects. We test our models with MS MARCO and TREC CAR, showing that our models outperforms other sparse modelsComment: To appear at EMNLP 202

    Supervised Understanding of Word Embeddings

    Full text link
    Pre-trained word embeddings are widely used for transfer learning in natural language processing. The embeddings are continuous and distributed representations of the words that preserve their similarities in compact Euclidean spaces. However, the dimensions of these spaces do not provide any clear interpretation. In this study, we have obtained supervised projections in the form of the linear keyword-level classifiers on word embeddings. We have shown that the method creates interpretable projections of original embedding dimensions. Activations of the trained classifier nodes correspond to a subset of the words in the vocabulary. Thus, they behave similarly to the dictionary features while having the merit of continuous value output. Additionally, such dictionaries can be grown iteratively with multiple rounds by adding expert labels on top-scoring words to an initial collection of the keywords. Also, the same classifiers can be applied to aligned word embeddings in other languages to obtain corresponding dictionaries. In our experiments, we have shown that initializing higher-order networks with these classifier weights gives more accurate models for downstream NLP tasks. We further demonstrate the usefulness of supervised dimensions in revealing the polysemous nature of a keyword of interest by projecting it's embedding using learned classifiers in different sub-spaces

    Sparse associative memory based on contextual code learning for disambiguating word senses

    Full text link
    In recent literature, contextual pretrained Language Models (LMs) demonstrated their potential in generalizing the knowledge to several Natural Language Processing (NLP) tasks including supervised Word Sense Disambiguation (WSD), a challenging problem in the field of Natural Language Understanding (NLU). However, word representations from these models are still very dense, costly in terms of memory footprint, as well as minimally interpretable. In order to address such issues, we propose a new supervised biologically inspired technique for transferring large pre-trained language model representations into a compressed representation, for the case of WSD. Our produced representation contributes to increase the general interpretability of the framework and to decrease memory footprint, while enhancing performance

    Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems

    Full text link
    Embedding representations power machine intelligence in many applications, including recommendation systems, but they are space intensive -- potentially occupying hundreds of gigabytes in large-scale settings. To help manage this outsized memory consumption, we explore mixed dimension embeddings, an embedding layer architecture in which a particular embedding vector's dimension scales with its query frequency. Through theoretical analysis and systematic experiments, we demonstrate that using mixed dimensions can drastically reduce the memory usage, while maintaining and even improving the ML performance. Empirically, we show that the proposed mixed dimension layers improve accuracy by 0.1% using half as many parameters or maintain it using 16X fewer parameters for click-through rate prediction task on the Criteo Kaggle dataset

    General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference

    Full text link
    The state of the art on many NLP tasks is currently achieved by large pre-trained language models, which require a considerable amount of computation. We explore a setting where many different predictions are made on a single piece of text. In that case, some of the computational cost during inference can be amortized over the different tasks using a shared text encoder. We compare approaches for training such an encoder and show that encoders pre-trained over multiple tasks generalize well to unseen tasks. We also compare ways of extracting fixed- and limited-size representations from this encoder, including different ways of pooling features extracted from multiple layers or positions. Our best approach compares favorably to knowledge distillation, achieving higher accuracy and lower computational cost once the system is handling around 7 tasks. Further, we show that through binary quantization, we can reduce the size of the extracted representations by a factor of 16 making it feasible to store them for later use. The resulting method offers a compelling solution for using large-scale pre-trained models at a fraction of the computational cost when multiple tasks are performed on the same text
    corecore