1,007 research outputs found
Exploring phrase-compositionality in skip-gram models
In this paper, we introduce a variation of the skip-gram model which jointly
learns distributed word vector representations and their way of composing to
form phrase embeddings. In particular, we propose a learning procedure that
incorporates a phrase-compositionality function which can capture how we want
to compose phrases vectors from their component word vectors. Our experiments
show improvement in word and phrase similarity tasks as well as syntactic tasks
like dependency parsing using the proposed joint models
Learning Semantically and Additively Compositional Distributional Representations
This paper connects a vector-based composition model to a formal semantics,
the Dependency-based Compositional Semantics (DCS). We show theoretical
evidence that the vector compositions in our model conform to the logic of DCS.
Experimentally, we show that vector-based composition brings a strong ability
to calculate similar phrases as similar vectors, achieving near
state-of-the-art on a wide range of phrase similarity tasks and relation
classification; meanwhile, DCS can guide building vectors for structured
queries that can be directly executed. We evaluate this utility on sentence
completion task and report a new state-of-the-art.Comment: to appear in ACL201
What are the Goals of Distributional Semantics?
Distributional semantic models have become a mainstay in NLP, providing
useful features for downstream tasks. However, assessing long-term progress
requires explicit long-term goals. In this paper, I take a broad linguistic
perspective, looking at how well current models can deal with various semantic
challenges. Given stark differences between models proposed in different
subfields, a broad perspective is needed to see how we could integrate them. I
conclude that, while linguistic insights can guide the design of model
architectures, future progress will require balancing the often conflicting
demands of linguistic expressiveness and computational tractability.Comment: To be published in Proceedings of the 58th Annual Meeting of the
Association for Computational Linguistics (ACL
Words are not Equal: Graded Weighting Model for building Composite Document Vectors
Despite the success of distributional semantics, composing phrases from word
vectors remains an important challenge. Several methods have been tried for
benchmark tasks such as sentiment classification, including word vector
averaging, matrix-vector approaches based on parsing, and on-the-fly learning
of paragraph vectors. Most models usually omit stop words from the composition.
Instead of such an yes-no decision, we consider several graded schemes where
words are weighted according to their discriminatory relevance with respect to
its use in the document (e.g., idf). Some of these methods (particularly
tf-idf) are seen to result in a significant improvement in performance over
prior state of the art. Further, combining such approaches into an ensemble
based on alternate classifiers such as the RNN model, results in an 1.6%
performance improvement on the standard IMDB movie review dataset, and a 7.01%
improvement on Amazon product reviews. Since these are language free models and
can be obtained in an unsupervised manner, they are of interest also for
under-resourced languages such as Hindi as well and many more languages. We
demonstrate the language free aspects by showing a gain of 12% for two review
datasets over earlier results, and also release a new larger dataset for future
testing (Singh,2015).Comment: 10 Pages, 2 Figures, 11 Table
Still a Pain in the Neck: Evaluating Text Representations on Lexical Composition
Building meaningful phrase representations is challenging because phrase
meanings are not simply the sum of their constituent meanings. Lexical
composition can shift the meanings of the constituent words and introduce
implicit information. We tested a broad range of textual representations for
their capacity to address these issues. We found that as expected,
contextualized word representations perform better than static word embeddings,
more so on detecting meaning shift than in recovering implicit information, in
which their performance is still far from that of humans. Our evaluation suite,
including 5 tasks related to lexical composition effects, can serve future
research aiming to improve such representations.Comment: TACL 201
Toward Mention Detection Robustness with Recurrent Neural Networks
One of the key challenges in natural language processing (NLP) is to yield
good performance across application domains and languages. In this work, we
investigate the robustness of the mention detection systems, one of the
fundamental tasks in information extraction, via recurrent neural networks
(RNNs). The advantage of RNNs over the traditional approaches is their capacity
to capture long ranges of context and implicitly adapt the word embeddings,
trained on a large corpus, into a task-specific word representation, but still
preserve the original semantic generalization to be helpful across domains. Our
systematic evaluation for RNN architectures demonstrates that RNNs not only
outperform the best reported systems (up to 9\% relative error reduction) in
the general setting but also achieve the state-of-the-art performance in the
cross-domain setting for English. Regarding other languages, RNNs are
significantly better than the traditional methods on the similar task of named
entity recognition for Dutch (up to 22\% relative error reduction).Comment: 13 pages, 11 tables, 3 figure
Unsupervised Sentence Representations as Word Information Series: Revisiting TF--IDF
Sentence representation at the semantic level is a challenging task for
Natural Language Processing and Artificial Intelligence. Despite the advances
in word embeddings (i.e. word vector representations), capturing sentence
meaning is an open question due to complexities of semantic interactions among
words. In this paper, we present an embedding method, which is aimed at
learning unsupervised sentence representations from unlabeled text. We propose
an unsupervised method that models a sentence as a weighted series of word
embeddings. The weights of the word embeddings are fitted by using Shannon's
word entropies provided by the Term Frequency--Inverse Document Frequency
(TF--IDF) transform. The hyperparameters of the model can be selected according
to the properties of data (e.g. sentence length and textual gender).
Hyperparameter selection involves word embedding methods and dimensionalities,
as well as weighting schemata. Our method offers advantages over existing
methods: identifiable modules, short-term training, online inference of
(unseen) sentence representations, as well as independence from domain,
external knowledge and language resources. Results showed that our model
outperformed the state of the art in well-known Semantic Textual Similarity
(STS) benchmarks. Moreover, our model reached state-of-the-art performance when
compared to supervised and knowledge-based STS systems
Neural Lattice Language Models
In this work, we propose a new language modeling paradigm that has the
ability to perform both prediction and moderation of information flow at
multiple granularities: neural lattice language models. These models construct
a lattice of possible paths through a sentence and marginalize across this
lattice to calculate sequence probabilities or optimize parameters. This
approach allows us to seamlessly incorporate linguistic intuitions - including
polysemy and existence of multi-word lexical items - into our language model.
Experiments on multiple language modeling tasks show that English neural
lattice language models that utilize polysemous embeddings are able to improve
perplexity by 9.95% relative to a word-level baseline, and that a Chinese model
that handles multi-character tokens is able to improve perplexity by 20.94%
relative to a character-level baseline
Towards Semantic Query Segmentation
Query Segmentation is one of the critical components for understanding users'
search intent in Information Retrieval tasks. It involves grouping tokens in
the search query into meaningful phrases which help downstream tasks like
search relevance and query understanding. In this paper, we propose a novel
approach to segment user queries using distributed query embeddings. Our key
contribution is a supervised approach to the segmentation task using
low-dimensional feature vectors for queries, getting rid of traditional hand
tuned and heuristic NLP features which are quite expensive.
We benchmark on a 50,000 human-annotated web search engine query corpus
achieving comparable accuracy to state-of-the-art techniques. The advantage of
our technique is its fast and does not use external knowledge-base like
Wikipedia for score boosting. This helps us generalize our approach to other
domains like eCommerce without any fine-tuning. We demonstrate the
effectiveness of this method on another 50,000 human-annotated eCommerce query
corpus from eBay search logs. Our approach is easy to implement and generalizes
well across different search domains proving the power of low-dimensional
embeddings in query segmentation task, opening up a new direction of research
for this problem
AWE-CM Vectors: Augmenting Word Embeddings with a Clinical Metathesaurus
In recent years, word embeddings have been surprisingly effective at
capturing intuitive characteristics of the words they represent. These vectors
achieve the best results when training corpora are extremely large, sometimes
billions of words. Clinical natural language processing datasets, however, tend
to be much smaller. Even the largest publicly-available dataset of medical
notes is three orders of magnitude smaller than the dataset of the oft-used
"Google News" word vectors. In order to make up for limited training data
sizes, we encode expert domain knowledge into our embeddings. Building on a
previous extension of word2vec, we show that generalizing the notion of a
word's "context" to include arbitrary features creates an avenue for encoding
domain knowledge into word embeddings. We show that the word vectors produced
by this method outperform their text-only counterparts across the board in
correlation with clinical experts
- …