104 research outputs found
Evaluating Word Embeddings in Multi-label Classification Using Fine-grained Name Typing
Embedding models typically associate each word with a single real-valued
vector, representing its different properties. Evaluation methods, therefore,
need to analyze the accuracy and completeness of these properties in
embeddings. This requires fine-grained analysis of embedding subspaces.
Multi-label classification is an appropriate way to do so. We propose a new
evaluation method for word embeddings based on multi-label classification given
a word embedding. The task we use is fine-grained name typing: given a large
corpus, find all types that a name can refer to based on the name embedding.
Given the scale of entities in knowledge bases, we can build datasets for this
task that are complementary to the current embedding evaluation datasets in:
they are very large, contain fine-grained classes, and allow the direct
evaluation of embeddings without confounding factors like sentence contextComment: 6 pages, The 3rd Workshop on Representation Learning for NLP
(RepL4NLP @ ACL2018
Retrieving Multi-Entity Associations: An Evaluation of Combination Modes for Word Embeddings
Word embeddings have gained significant attention as learnable
representations of semantic relations between words, and have been shown to
improve upon the results of traditional word representations. However, little
effort has been devoted to using embeddings for the retrieval of entity
associations beyond pairwise relations. In this paper, we use popular embedding
methods to train vector representations of an entity-annotated news corpus, and
evaluate their performance for the task of predicting entity participation in
news events versus a traditional word cooccurrence network as a baseline. To
support queries for events with multiple participating entities, we test a
number of combination modes for the embedding vectors. While we find that even
the best combination modes for word embeddings do not quite reach the
performance of the full cooccurrence network, especially for rare entities, we
observe that different embedding methods model different types of relations,
thereby indicating the potential for ensemble methods.Comment: 4 pages; Accepted at SIGIR'1
Concept Embedding for Relevance Detection of Search Queries Regarding CHOP.
Automatic encoding of diagnosis and procedures can increase the interoperability and efficacy of the clinical cooperation. The concept, rule-based and machine learning classification methods for automatic code generation can easily reach their limit due to the handcrafted rules and a limited coverage of the vocabulary in a concept library. As the first step to apply deep learning methods in automatic encoding in the clinical domain, a suitable semantic representation should be generated. In this work, we will focus on the embedding mechanism and dimensional reduction method for text representation, which mitigate the sparseness of the data input in the clinical domain. Different methods such as word embedding and random projection will be evaluated based on logs of query-document matching
A Mixture Model for Learning Multi-Sense Word Embeddings
Word embeddings are now a standard technique for inducing meaning
representations for words. For getting good representations, it is important to
take into account different senses of a word. In this paper, we propose a
mixture model for learning multi-sense word embeddings. Our model generalizes
the previous works in that it allows to induce different weights of different
senses of a word. The experimental results show that our model outperforms
previous models on standard evaluation tasks.Comment: *SEM 201
Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource
Word embeddings have recently seen a strong increase in interest as a result
of strong performance gains on a variety of tasks. However, most of this
research also underlined the importance of benchmark datasets, and the
difficulty of constructing these for a variety of language-specific tasks.
Still, many of the datasets used in these tasks could prove to be fruitful
linguistic resources, allowing for unique observations into language use and
variability. In this paper we demonstrate the performance of multiple types of
embeddings, created with both count and prediction-based architectures on a
variety of corpora, in two language-specific tasks: relation evaluation, and
dialect identification. For the latter, we compare unsupervised methods with a
traditional, hand-crafted dictionary. With this research, we provide the
embeddings themselves, the relation evaluation task benchmark for use in
further research, and demonstrate how the benchmarked embeddings prove a useful
unsupervised linguistic resource, effectively used in a downstream task.Comment: in LREC 201
Comparative Analysis of Word Embeddings for Capturing Word Similarities
Distributed language representation has become the most widely used technique
for language representation in various natural language processing tasks. Most
of the natural language processing models that are based on deep learning
techniques use already pre-trained distributed word representations, commonly
called word embeddings. Determining the most qualitative word embeddings is of
crucial importance for such models. However, selecting the appropriate word
embeddings is a perplexing task since the projected embedding space is not
intuitive to humans. In this paper, we explore different approaches for
creating distributed word representations. We perform an intrinsic evaluation
of several state-of-the-art word embedding methods. Their performance on
capturing word similarities is analysed with existing benchmark datasets for
word pairs similarities. The research in this paper conducts a correlation
analysis between ground truth word similarities and similarities obtained by
different word embedding methods.Comment: Part of the 6th International Conference on Natural Language
Processing (NATP 2020
- …