Search CORE

104 research outputs found

Evaluating Word Embeddings in Multi-label Classification Using Fine-grained Name Typing

Author: Kann Katharina
Schütze Hinrich
Yaghoobzadeh Yadollah
Publication venue
Publication date: 01/01/2018
Field of study

Embedding models typically associate each word with a single real-valued vector, representing its different properties. Evaluation methods, therefore, need to analyze the accuracy and completeness of these properties in embeddings. This requires fine-grained analysis of embedding subspaces. Multi-label classification is an appropriate way to do so. We propose a new evaluation method for word embeddings based on multi-label classification given a word embedding. The task we use is fine-grained name typing: given a large corpus, find all types that a name can refer to based on the name embedding. Given the scale of entities in knowledge bases, we can build datasets for this task that are complementary to the current embedding evaluation datasets in: they are very large, contain fine-grained classes, and allow the direct evaluation of embeddings without confounding factors like sentence contextComment: 6 pages, The 3rd Workshop on Representation Learning for NLP (RepL4NLP @ ACL2018

arXiv.org e-Print Archive

Crossref

Retrieving Multi-Entity Associations: An Evaluation of Combination Modes for Word Embeddings

Author: Devlin Jacob
Diaz Fernando
Iacobacci Ignacio
Karlgren Jussi
McDonald Graham
Mikolov Tomas
Pennington Jeffrey
Schnabel Tobias
Socher Richard
Socher Richard
Spitz Andreas
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/05/2019
Field of study

Word embeddings have gained significant attention as learnable representations of semantic relations between words, and have been shown to improve upon the results of traditional word representations. However, little effort has been devoted to using embeddings for the retrieval of entity associations beyond pairwise relations. In this paper, we use popular embedding methods to train vector representations of an entity-annotated news corpus, and evaluate their performance for the task of predicting entity participation in news events versus a traditional word cooccurrence network as a baseline. To support queries for events with multiple participating entities, we test a number of combination modes for the embedding vectors. While we find that even the best combination modes for word embeddings do not quite reach the performance of the full cooccurrence network, especially for rare entities, we observe that different embedding methods model different types of relations, thereby indicating the potential for ensemble methods.Comment: 4 pages; Accepted at SIGIR'1

arXiv.org e-Print Archive

Crossref

Concept Embedding for Relevance Detection of Search Queries Regarding CHOP.

Author: Denecke Kerstin
Deng Yihan
Faulstich Lukas
Publication venue
Publication date: 01/01/2017
Field of study

Automatic encoding of diagnosis and procedures can increase the interoperability and efficacy of the clinical cooperation. The concept, rule-based and machine learning classification methods for automatic code generation can easily reach their limit due to the handcrafted rules and a limited coverage of the vocabulary in a concept library. As the first step to apply deep learning methods in automatic encoding in the clinical domain, a suitable semantic representation should be generated. In this work, we will focus on the embedding mechanism and dimensional reduction method for text representation, which mitigate the sparseness of the data input in the clinical domain. Different methods such as word embedding and random projection will be evaluated based on logs of query-document matching

Berner Fachhochschule: ARBOR

A Mixture Model for Learning Multi-Sense Word Embeddings

Author: Modi Ashutosh
Nguyen Dai Quoc
Nguyen Dat Quoc
Pinkal Manfred
Thater Stefan
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

Word embeddings are now a standard technique for inducing meaning representations for words. For getting good representations, it is important to take into account different senses of a word. In this paper, we propose a mixture model for learning multi-sense word embeddings. Our model generalizes the previous works in that it allows to induce different weights of different senses of a word. The experimental results show that our model outperforms previous models on standard evaluation tasks.Comment: *SEM 201

arXiv.org e-Print Archive

Crossref

Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource

Author: Daelemans Walter
Emmery Chris
Tulkens Stéphan
Publication venue
Publication date: 01/01/2016
Field of study

Word embeddings have recently seen a strong increase in interest as a result of strong performance gains on a variety of tasks. However, most of this research also underlined the importance of benchmark datasets, and the difficulty of constructing these for a variety of language-specific tasks. Still, many of the datasets used in these tasks could prove to be fruitful linguistic resources, allowing for unique observations into language use and variability. In this paper we demonstrate the performance of multiple types of embeddings, created with both count and prediction-based architectures on a variety of corpora, in two language-specific tasks: relation evaluation, and dialect identification. For the latter, we compare unsupervised methods with a traditional, hand-crafted dictionary. With this research, we provide the embeddings themselves, the relation evaluation task benchmark for use in further research, and demonstrate how the benchmarked embeddings prove a useful unsupervised linguistic resource, effectively used in a downstream task.Comment: in LREC 201

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen

Tilburg University Repository

Comparative Analysis of Word Embeddings for Capturing Word Similarities

Author: Kalajdjieski Jovan
Stojanovska Frosina
Toshevska Martina
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 07/05/2020
Field of study

Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans. In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.Comment: Part of the 6th International Conference on Natural Language Processing (NATP 2020

arXiv.org e-Print Archive

Crossref