123,643 research outputs found
WordRank: Learning Word Embeddings via Robust Ranking
Embedding words in a vector space has gained a lot of attention in recent
years. While state-of-the-art methods provide efficient computation of word
similarities via a low-dimensional matrix embedding, their motivation is often
left unclear. In this paper, we argue that word embedding can be naturally
viewed as a ranking problem due to the ranking nature of the evaluation
metrics. Then, based on this insight, we propose a novel framework WordRank
that efficiently estimates word representations via robust ranking, in which
the attention mechanism and robustness to noise are readily achieved via the
DCG-like ranking losses. The performance of WordRank is measured in word
similarity and word analogy benchmarks, and the results are compared to the
state-of-the-art word embedding techniques. Our algorithm is very competitive
to the state-of-the- arts on large corpora, while outperforms them by a
significant margin when the training set is limited (i.e., sparse and noisy).
With 17 million tokens, WordRank performs almost as well as existing methods
using 7.2 billion tokens on a popular word similarity benchmark. Our multi-node
distributed implementation of WordRank is publicly available for general usage.Comment: Conference on Empirical Methods in Natural Language Processing
(EMNLP), November 1-5, 2016, Austin, Texas, US
Relevance-based Word Embedding
Learning a high-dimensional dense representation for vocabulary terms, also
known as a word embedding, has recently attracted much attention in natural
language processing and information retrieval tasks. The embedding vectors are
typically learned based on term proximity in a large corpus. This means that
the objective in well-known word embedding algorithms, e.g., word2vec, is to
accurately predict adjacent word(s) for a given word or context. However, this
objective is not necessarily equivalent to the goal of many information
retrieval (IR) tasks. The primary objective in various IR tasks is to capture
relevance instead of term proximity, syntactic, or even semantic similarity.
This is the motivation for developing unsupervised relevance-based word
embedding models that learn word representations based on query-document
relevance information. In this paper, we propose two learning models with
different objective functions; one learns a relevance distribution over the
vocabulary set for each query, and the other classifies each term as belonging
to the relevant or non-relevant class for each query. To train our models, we
used over six million unique queries and the top ranked documents retrieved in
response to each query, which are assumed to be relevant to the query. We
extrinsically evaluate our learned word representation models using two IR
tasks: query expansion and query classification. Both query expansion
experiments on four TREC collections and query classification experiments on
the KDD Cup 2005 dataset suggest that the relevance-based word embedding models
significantly outperform state-of-the-art proximity-based embedding models,
such as word2vec and GloVe.Comment: to appear in the proceedings of The 40th International ACM SIGIR
Conference on Research and Development in Information Retrieval (SIGIR '17
Retrieving Multi-Entity Associations: An Evaluation of Combination Modes for Word Embeddings
Word embeddings have gained significant attention as learnable
representations of semantic relations between words, and have been shown to
improve upon the results of traditional word representations. However, little
effort has been devoted to using embeddings for the retrieval of entity
associations beyond pairwise relations. In this paper, we use popular embedding
methods to train vector representations of an entity-annotated news corpus, and
evaluate their performance for the task of predicting entity participation in
news events versus a traditional word cooccurrence network as a baseline. To
support queries for events with multiple participating entities, we test a
number of combination modes for the embedding vectors. While we find that even
the best combination modes for word embeddings do not quite reach the
performance of the full cooccurrence network, especially for rare entities, we
observe that different embedding methods model different types of relations,
thereby indicating the potential for ensemble methods.Comment: 4 pages; Accepted at SIGIR'1
Using Word Embeddings in Twitter Election Classification
Word embeddings and convolutional neural networks (CNN)
have attracted extensive attention in various classification
tasks for Twitter, e.g. sentiment classification. However,
the effect of the configuration used to train and generate
the word embeddings on the classification performance has
not been studied in the existing literature. In this paper,
using a Twitter election classification task that aims to detect
election-related tweets, we investigate the impact of
the background dataset used to train the embedding models,
the context window size and the dimensionality of word
embeddings on the classification performance. By comparing
the classification results of two word embedding models,
which are trained using different background corpora
(e.g. Wikipedia articles and Twitter microposts), we show
that the background data type should align with the Twitter
classification dataset to achieve a better performance. Moreover,
by evaluating the results of word embeddings models
trained using various context window sizes and dimensionalities,
we found that large context window and dimension
sizes are preferable to improve the performance. Our experimental
results also show that using word embeddings and
CNN leads to statistically significant improvements over various
baselines such as random, SVM with TF-IDF and SVM
with word embeddings
Using Multi-Sense Vector Embeddings for Reverse Dictionaries
Popular word embedding methods such as word2vec and GloVe assign a single vector representation to each word, even if a word has multiple distinct meanings. Multi-sense embeddings instead provide different vectors for each sense of a word. However, they typically cannot serve as a drop-in replacement for conventional single-sense embeddings, because the correct sense vector needs to be selected for each word. In this work, we study the effect of multi-sense embeddings on the task of reverse dictionaries. We propose a technique to easily integrate them into an existing neural network architecture using an attention mechanism. Our experiments demonstrate that large improvements can be obtained when employing multi-sense embeddings both in the input sequence as well as for the target representation. An analysis of the sense distributions and of the learned attention is provided as well
Exploiting Sentence Embedding for Medical Question Answering
Despite the great success of word embedding, sentence embedding remains a
not-well-solved problem. In this paper, we present a supervised learning
framework to exploit sentence embedding for the medical question answering
task. The learning framework consists of two main parts: 1) a sentence
embedding producing module, and 2) a scoring module. The former is developed
with contextual self-attention and multi-scale techniques to encode a sentence
into an embedding tensor. This module is shortly called Contextual
self-Attention Multi-scale Sentence Embedding (CAMSE). The latter employs two
scoring strategies: Semantic Matching Scoring (SMS) and Semantic Association
Scoring (SAS). SMS measures similarity while SAS captures association between
sentence pairs: a medical question concatenated with a candidate choice, and a
piece of corresponding supportive evidence. The proposed framework is examined
by two Medical Question Answering(MedicalQA) datasets which are collected from
real-world applications: medical exam and clinical diagnosis based on
electronic medical records (EMR). The comparison results show that our proposed
framework achieved significant improvements compared to competitive baseline
approaches. Additionally, a series of controlled experiments are also conducted
to illustrate that the multi-scale strategy and the contextual self-attention
layer play important roles for producing effective sentence embedding, and the
two kinds of scoring strategies are highly complementary to each other for
question answering problems.Comment: 8 page
- …