2,488 research outputs found
Evaluating Word Embeddings in Multi-label Classification Using Fine-grained Name Typing
Embedding models typically associate each word with a single real-valued
vector, representing its different properties. Evaluation methods, therefore,
need to analyze the accuracy and completeness of these properties in
embeddings. This requires fine-grained analysis of embedding subspaces.
Multi-label classification is an appropriate way to do so. We propose a new
evaluation method for word embeddings based on multi-label classification given
a word embedding. The task we use is fine-grained name typing: given a large
corpus, find all types that a name can refer to based on the name embedding.
Given the scale of entities in knowledge bases, we can build datasets for this
task that are complementary to the current embedding evaluation datasets in:
they are very large, contain fine-grained classes, and allow the direct
evaluation of embeddings without confounding factors like sentence contextComment: 6 pages, The 3rd Workshop on Representation Learning for NLP
(RepL4NLP @ ACL2018
Hierarchical Losses and New Resources for Fine-grained Entity Typing and Linking
Extraction from raw text to a knowledge base of entities and fine-grained
types is often cast as prediction into a flat set of entity and type labels,
neglecting the rich hierarchies over types and entities contained in curated
ontologies. Previous attempts to incorporate hierarchical structure have
yielded little benefit and are restricted to shallow ontologies. This paper
presents new methods using real and complex bilinear mappings for integrating
hierarchical information, yielding substantial improvement over flat
predictions in entity linking and fine-grained entity typing, and achieving new
state-of-the-art results for end-to-end models on the benchmark FIGER dataset.
We also present two new human-annotated datasets containing wide and deep
hierarchies which we will release to the community to encourage further
research in this direction: MedMentions, a collection of PubMed abstracts in
which 246k mentions have been mapped to the massive UMLS ontology; and TypeNet,
which aligns Freebase types with the WordNet hierarchy to obtain nearly 2k
entity types. In experiments on all three datasets we show substantial gains
from hierarchy-aware training.Comment: ACL 201
- …