21,961 research outputs found
ScienceExamCER: A High-Density Fine-Grained Science-Domain Corpus for Common Entity Recognition
Named entity recognition identifies common classes of entities in text, but
these entity labels are generally sparse, limiting utility to downstream tasks.
In this work we present ScienceExamCER, a densely-labeled semantic
classification corpus of 133k mentions in the science exam domain where nearly
all (96%) of content words have been annotated with one or more fine-grained
semantic class labels including taxonomic groups, meronym groups, verb/action
groups, properties and values, and synonyms. Semantic class labels are drawn
from a manually-constructed fine-grained typology of 601 classes generated
through a data-driven analysis of 4,239 science exam questions. We show an
off-the-shelf BERT-based named entity recognition model modified for
multi-label classification achieves an accuracy of 0.85 F1 on this task,
suggesting strong utility for downstream tasks in science domain question
answering requiring densely-labeled semantic classification
Hierarchical Losses and New Resources for Fine-grained Entity Typing and Linking
Extraction from raw text to a knowledge base of entities and fine-grained
types is often cast as prediction into a flat set of entity and type labels,
neglecting the rich hierarchies over types and entities contained in curated
ontologies. Previous attempts to incorporate hierarchical structure have
yielded little benefit and are restricted to shallow ontologies. This paper
presents new methods using real and complex bilinear mappings for integrating
hierarchical information, yielding substantial improvement over flat
predictions in entity linking and fine-grained entity typing, and achieving new
state-of-the-art results for end-to-end models on the benchmark FIGER dataset.
We also present two new human-annotated datasets containing wide and deep
hierarchies which we will release to the community to encourage further
research in this direction: MedMentions, a collection of PubMed abstracts in
which 246k mentions have been mapped to the massive UMLS ontology; and TypeNet,
which aligns Freebase types with the WordNet hierarchy to obtain nearly 2k
entity types. In experiments on all three datasets we show substantial gains
from hierarchy-aware training.Comment: ACL 201
- …