4 research outputs found
Learning Compressed Sentence Representations for On-Device Text Processing
Vector representations of sentences, trained on massive text corpora, are
widely used as generic sentence embeddings across a variety of NLP problems.
The learned representations are generally assumed to be continuous and
real-valued, giving rise to a large memory footprint and slow retrieval speed,
which hinders their applicability to low-resource (memory and computation)
platforms, such as mobile devices. In this paper, we propose four different
strategies to transform continuous and generic sentence embeddings into a
binarized form, while preserving their rich semantic information. The
introduced methods are evaluated across a wide range of downstream tasks, where
the binarized sentence embeddings are demonstrated to degrade performance by
only about 2% relative to their continuous counterparts, while reducing the
storage requirement by over 98%. Moreover, with the learned binary
representations, the semantic relatedness of two sentences can be evaluated by
simply calculating their Hamming distance, which is more computational
efficient compared with the inner product operation between continuous
embeddings. Detailed analysis and case study further validate the effectiveness
of proposed methods.Comment: To appear at ACL 201
General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference
The state of the art on many NLP tasks is currently achieved by large
pre-trained language models, which require a considerable amount of
computation. We explore a setting where many different predictions are made on
a single piece of text. In that case, some of the computational cost during
inference can be amortized over the different tasks using a shared text
encoder. We compare approaches for training such an encoder and show that
encoders pre-trained over multiple tasks generalize well to unseen tasks. We
also compare ways of extracting fixed- and limited-size representations from
this encoder, including different ways of pooling features extracted from
multiple layers or positions. Our best approach compares favorably to knowledge
distillation, achieving higher accuracy and lower computational cost once the
system is handling around 7 tasks. Further, we show that through binary
quantization, we can reduce the size of the extracted representations by a
factor of 16 making it feasible to store them for later use. The resulting
method offers a compelling solution for using large-scale pre-trained models at
a fraction of the computational cost when multiple tasks are performed on the
same text
FairFil: Contrastive Neural Debiasing Method for Pretrained Text Encoders
Pretrained text encoders, such as BERT, have been applied increasingly in
various natural language processing (NLP) tasks, and have recently demonstrated
significant performance gains. However, recent studies have demonstrated the
existence of social bias in these pretrained NLP models. Although prior works
have made progress on word-level debiasing, improved sentence-level fairness of
pretrained encoders still lacks exploration. In this paper, we proposed the
first neural debiasing method for a pretrained sentence encoder, which
transforms the pretrained encoder outputs into debiased representations via a
fair filter (FairFil) network. To learn the FairFil, we introduce a contrastive
learning framework that not only minimizes the correlation between filtered
embeddings and bias words but also preserves rich semantic information of the
original sentences. On real-world datasets, our FairFil effectively reduces the
bias degree of pretrained text encoders, while continuously showing desirable
performance on downstream tasks. Moreover, our post-hoc method does not require
any retraining of the text encoders, further enlarging FairFil's application
space.Comment: Accepted by the 9th International Conference on Learning
Representations (ICLR 2021
Contextual Lensing of Universal Sentence Representations
What makes a universal sentence encoder universal? The notion of a generic
encoder of text appears to be at odds with the inherent contextualization and
non-permanence of language use in a dynamic world. However, mapping sentences
into generic fixed-length vectors for downstream similarity and retrieval tasks
has been fruitful, particularly for multilingual applications. How do we manage
this dilemma? In this work we propose Contextual Lensing, a methodology for
inducing context-oriented universal sentence vectors. We break the construction
of universal sentence vectors into a core, variable length, sentence matrix
representation equipped with an adaptable `lens' from which fixed-length
vectors can be induced as a function of the lens context. We show that it is
possible to focus notions of language similarity into a small number of lens
parameters given a core universal matrix representation. For example, we
demonstrate the ability to encode translation similarity of sentences across
several languages into a single weight matrix, even when the core encoder has
not seen parallel data.Comment: 10 page