4,995 research outputs found
BioRED: A Comprehensive Biomedical Relation Extraction Dataset
Automated relation extraction (RE) from biomedical literature is critical for
many downstream text mining applications in both research and real-world
settings. However, most existing benchmarking datasets for bio-medical RE only
focus on relations of a single type (e.g., protein-protein interactions) at the
sentence level, greatly limiting the development of RE systems in biomedicine.
In this work, we first review commonly used named entity recognition (NER) and
RE datasets. Then we present BioRED, a first-of-its-kind biomedical RE corpus
with multiple entity types (e.g., gene/protein, disease, chemical) and relation
pairs (e.g., gene-disease; chemical-chemical), on a set of 600 PubMed articles.
Further, we label each relation as describing either a novel finding or
previously known background knowledge, enabling automated algorithms to
differentiate between novel and background information. We assess the utility
of BioRED by benchmarking several existing state-of-the-art methods, including
BERT-based models, on the NER and RE tasks. Our results show that while
existing approaches can reach high performance on the NER task (F-score of
89.3%), there is much room for improvement for the RE task, especially when
extracting novel relations (F-score of 47.7%). Our experiments also demonstrate
that such a comprehensive dataset can successfully facilitate the development
of more accurate, efficient, and robust RE systems for biomedicine
Document-Level Relation Extraction with Reconstruction
In document-level relation extraction (DocRE), graph structure is generally
used to encode relation information in the input document to classify the
relation category between each entity pair, and has greatly advanced the DocRE
task over the past several years. However, the learned graph representation
universally models relation information between all entity pairs regardless of
whether there are relationships between these entity pairs. Thus, those entity
pairs without relationships disperse the attention of the encoder-classifier
DocRE for ones with relationships, which may further hind the improvement of
DocRE. To alleviate this issue, we propose a novel
encoder-classifier-reconstructor model for DocRE. The reconstructor manages to
reconstruct the ground-truth path dependencies from the graph representation,
to ensure that the proposed DocRE model pays more attention to encode entity
pairs with relationships in the training. Furthermore, the reconstructor is
regarded as a relationship indicator to assist relation classification in the
inference, which can further improve the performance of DocRE model.
Experimental results on a large-scale DocRE dataset show that the proposed
model can significantly improve the accuracy of relation extraction on a strong
heterogeneous graph-based baseline.Comment: 9 pages, 5 figures, 6 tables. Accepted by AAAI 2021 (Long Paper
- …