7,629 research outputs found
Mimicking Word Embeddings using Subword RNNs
Word embeddings improve generalization over lexical features by placing each
word in a lower-dimensional space, using distributional information obtained
from unlabeled data. However, the effectiveness of word embeddings for
downstream NLP tasks is limited by out-of-vocabulary (OOV) words, for which
embeddings do not exist. In this paper, we present MIMICK, an approach to
generating OOV word embeddings compositionally, by learning a function from
spellings to distributional embeddings. Unlike prior work, MIMICK does not
require re-training on the original word embedding corpus; instead, learning is
performed at the type level. Intrinsic and extrinsic evaluations demonstrate
the power of this simple approach. On 23 languages, MIMICK improves performance
over a word-based baseline for tagging part-of-speech and morphosyntactic
attributes. It is competitive with (and complementary to) a supervised
character-based model in low-resource settings.Comment: EMNLP 201
Think Globally, Embed Locally - Locally Linear Meta-embedding of Words
Distributed word embeddings have shown superior performances in numerous
Natural Language Processing (NLP) tasks. However, their performances vary
significantly across different tasks, implying that the word embeddings learnt
by those methods capture complementary aspects of lexical semantics. Therefore,
we believe that it is important to combine the existing word embeddings to
produce more accurate and complete \emph{meta-embeddings} of words. For this
purpose, we propose an unsupervised locally linear meta-embedding learning
method that takes pre-trained word embeddings as the input, and produces more
accurate meta embeddings. Unlike previously proposed meta-embedding learning
methods that learn a global projection over all words in a vocabulary, our
proposed method is sensitive to the differences in local neighbourhoods of the
individual source word embeddings. Moreover, we show that vector concatenation,
a previously proposed highly competitive baseline approach for integrating word
embeddings, can be derived as a special case of the proposed method.
Experimental results on semantic similarity, word analogy, relation
classification, and short-text classification tasks show that our
meta-embeddings to significantly outperform prior methods in several benchmark
datasets, establishing a new state of the art for meta-embeddings
Evaluating Word Embeddings in Multi-label Classification Using Fine-grained Name Typing
Embedding models typically associate each word with a single real-valued
vector, representing its different properties. Evaluation methods, therefore,
need to analyze the accuracy and completeness of these properties in
embeddings. This requires fine-grained analysis of embedding subspaces.
Multi-label classification is an appropriate way to do so. We propose a new
evaluation method for word embeddings based on multi-label classification given
a word embedding. The task we use is fine-grained name typing: given a large
corpus, find all types that a name can refer to based on the name embedding.
Given the scale of entities in knowledge bases, we can build datasets for this
task that are complementary to the current embedding evaluation datasets in:
they are very large, contain fine-grained classes, and allow the direct
evaluation of embeddings without confounding factors like sentence contextComment: 6 pages, The 3rd Workshop on Representation Learning for NLP
(RepL4NLP @ ACL2018
Distilling word vectors from contextualised language models
Although contextualised language models (CLMs) have reduced the need for word embedding in various NLP tasks, static representations of word meaning remain crucial in tasks where words have to be encoded without context. Such tasks arise in domains such as information retrieval. Compared to learning static word embeddings from scratch, distilling such representations from CLMs has advantages in downstream tasks[68],[2]. Usually, the embedding of a word w is distilled by feeding random sentences that mention w to a CLM and extracting the parameters. In this research, we assume distilling word embeddings from CLMs can be improved by feeding more informative mentions to a CLM. Therefore, as a first contribution in this thesis, we proposed a strategy for sentence selection by using a topic model.
Since distilling high-quality word embeddings from CLMs requires many mentions for each word, we investigate whether we can obtain decent word embeddings by using a few but carefully selected mentions of each word. As our second contribution, we explored a range of sentence selection strategies and tested their generated word embeddings on various evaluation tasks. We found that 20 informative sentences per word
are sufficient to obtain competitive word embeddings, especially when the sentences are selected by our proposed strategies.
Besides improving the sentence selection strategy, as our third contribution, we also studied other strategies for obtaining word embeddings. We found that SBERT embeddings capture an aspect of word meaning that is highly complementary to the mention embeddings we previously focused on. Therefore, we proposed combining the vectors generated from these two methods through a contrastive learning model. The results
confirm that combining these vectors leads to more informative word embeddings.
In conclusion, this thesis shows that better static word embeddings can be efficiently distilled from CLMs by strategically selecting sentences and combining complementary method
Knowledge-aware Complementary Product Representation Learning
Learning product representations that reflect complementary relationship
plays a central role in e-commerce recommender system. In the absence of the
product relationships graph, which existing methods rely on, there is a need to
detect the complementary relationships directly from noisy and sparse customer
purchase activities. Furthermore, unlike simple relationships such as
similarity, complementariness is asymmetric and non-transitive. Standard usage
of representation learning emphasizes on only one set of embedding, which is
problematic for modelling such properties of complementariness. We propose
using knowledge-aware learning with dual product embedding to solve the above
challenges. We encode contextual knowledge into product representation by
multi-task learning, to alleviate the sparsity issue. By explicitly modelling
with user bias terms, we separate the noise of customer-specific preferences
from the complementariness. Furthermore, we adopt the dual embedding framework
to capture the intrinsic properties of complementariness and provide geometric
interpretation motivated by the classic separating hyperplane theory. Finally,
we propose a Bayesian network structure that unifies all the components, which
also concludes several popular models as special cases. The proposed method
compares favourably to state-of-art methods, in downstream classification and
recommendation tasks. We also develop an implementation that scales efficiently
to a dataset with millions of items and customers
- …