5 research outputs found
Efficient Sentence Embedding via Semantic Subspace Analysis
A novel sentence embedding method built upon semantic subspace analysis,
called semantic subspace sentence embedding (S3E), is proposed in this work.
Given the fact that word embeddings can capture semantic relationship while
semantically similar words tend to form semantic groups in a high-dimensional
embedding space, we develop a sentence representation scheme by analyzing
semantic subspaces of its constituent words. Specifically, we construct a
sentence model from two aspects. First, we represent words that lie in the same
semantic group using the intra-group descriptor. Second, we characterize the
interaction between multiple semantic groups with the inter-group descriptor.
The proposed S3E method is evaluated on both textual similarity tasks and
supervised tasks. Experimental results show that it offers comparable or better
performance than the state-of-the-art. The complexity of our S3E method is also
much lower than other parameterized models.Comment: 7 pages, 2 figure
Structural-Aware Sentence Similarity with Recursive Optimal Transport
Measuring sentence similarity is a classic topic in natural language
processing. Light-weighted similarities are still of particular practical
significance even when deep learning models have succeeded in many other tasks.
Some light-weighted similarities with more theoretical insights have been
demonstrated to be even stronger than supervised deep learning approaches.
However, the successful light-weighted models such as Word Mover's Distance
[Kusner et al., 2015] or Smooth Inverse Frequency [Arora et al., 2017] failed
to detect the difference from the structure of sentences, i.e. order of words.
To address this issue, we present Recursive Optimal Transport (ROT) framework
to incorporate the structural information with the classic OT. Moreover, we
further develop Recursive Optimal Similarity (ROTS) for sentences with the
valuable semantic insights from the connections between cosine similarity of
weighted average of word vectors and optimal transport. ROTS is
structural-aware and with low time complexity compared to optimal transport.
Our experiments over 20 sentence textural similarity (STS) datasets show the
clear advantage of ROTS over all weakly supervised approaches. Detailed
ablation study demonstrate the effectiveness of ROT and the semantic insights.Comment: 7 pages, 2 figure
Universal Sentence Representation Learning with Conditional Masked Language Model
This paper presents a novel training method, Conditional Masked Language
Modeling (CMLM), to effectively learn sentence representations on large scale
unlabeled corpora. CMLM integrates sentence representation learning into MLM
training by conditioning on the encoded vectors of adjacent sentences. Our
English CMLM model achieves state-of-the-art performance on SentEval, even
outperforming models learned using (semi-)supervised signals. As a fully
unsupervised learning method, CMLM can be conveniently extended to a broad
range of languages and domains. We find that a multilingual CMLM model
co-trained with bitext retrieval~(BR) and natural language inference~(NLI)
tasks outperforms the previous state-of-the-art multilingual models by a large
margin. We explore the same language bias of the learned representations, and
propose a principle component based approach to remove the language identifying
information from the representation while still retaining sentence semantics.Comment: preprint, updated licens
A Comparative Study on Structural and Semantic Properties of Sentence Embeddings
Sentence embeddings encode natural language sentences as low-dimensional
dense vectors. A great deal of effort has been put into using sentence
embeddings to improve several important natural language processing tasks.
Relation extraction is such an NLP task that aims at identifying structured
relations defined in a knowledge base from unstructured text. A promising and
more efficient approach would be to embed both the text and structured
knowledge in low-dimensional spaces and discover semantic alignments or
mappings between them. Although a number of techniques have been proposed in
the literature for embedding both sentences and knowledge graphs, little is
known about the structural and semantic properties of these embedding spaces in
terms of relation extraction. In this paper, we investigate the aforementioned
properties by evaluating the extent to which sentences carrying similar senses
are embedded in close proximity sub-spaces, and if we can exploit that
structure to align sentences to a knowledge graph. We propose a set of
experiments using a widely-used large-scale data set for relation extraction
and focusing on a set of key sentence embedding methods. We additionally
provide the code for reproducing these experiments at
https://github.com/akalino/semantic-structural-sentences. These embedding
methods cover a wide variety of techniques ranging from simple word embedding
combination to transformer-based BERT-style model. Our experimental results
show that different embedding spaces have different degrees of strength for the
structural and semantic properties. These results provide useful information
for developing embedding-based relation extraction methods.Comment: 10 pages, 3 figure
SBERT-WK: A Sentence Embedding Method by Dissecting BERT-based Word Models
Sentence embedding is an important research topic in natural language
processing (NLP) since it can transfer knowledge to downstream tasks.
Meanwhile, a contextualized word representation, called BERT, achieves the
state-of-the-art performance in quite a few NLP tasks. Yet, it is an open
problem to generate a high quality sentence representation from BERT-based word
models. It was shown in previous study that different layers of BERT capture
different linguistic properties. This allows us to fusion information across
layers to find better sentence representation. In this work, we study the
layer-wise pattern of the word representation of deep contextualized models.
Then, we propose a new sentence embedding method by dissecting BERT-based word
models through geometric analysis of the space spanned by the word
representation. It is called the SBERT-WK method. No further training is
required in SBERT-WK. We evaluate SBERT-WK on semantic textual similarity and
downstream supervised tasks. Furthermore, ten sentence-level probing tasks are
presented for detailed linguistic analysis. Experiments show that SBERT-WK
achieves the state-of-the-art performance. Our codes are publicly available