Search CORE

171,980 research outputs found

Hierarchical Losses and New Resources for Fine-grained Entity Typing and Linking

Author: McCallum Andrew
Murty* Shikhar
Radovanovic Irena
Verga* Patrick
Vilnis Luke
Publication venue
Publication date: 01/01/2018
Field of study

Extraction from raw text to a knowledge base of entities and fine-grained types is often cast as prediction into a flat set of entity and type labels, neglecting the rich hierarchies over types and entities contained in curated ontologies. Previous attempts to incorporate hierarchical structure have yielded little benefit and are restricted to shallow ontologies. This paper presents new methods using real and complex bilinear mappings for integrating hierarchical information, yielding substantial improvement over flat predictions in entity linking and fine-grained entity typing, and achieving new state-of-the-art results for end-to-end models on the benchmark FIGER dataset. We also present two new human-annotated datasets containing wide and deep hierarchies which we will release to the community to encourage further research in this direction: MedMentions, a collection of PubMed abstracts in which 246k mentions have been mapped to the massive UMLS ontology; and TypeNet, which aligns Freebase types with the WordNet hierarchy to obtain nearly 2k entity types. In experiments on all three datasets we show substantial gains from hierarchy-aware training.Comment: ACL 201

arXiv.org e-Print Archive

Crossref

Building a semantically annotated corpus of clinical texts

Author: Andrea Setzer
Angus Roberts
Denny
Franzén
Friedman
Gennari
George Demetriou
Hersh
Hripcsak
Ian Roberts
Kim
Lindberg
Mark Hepple
Meystre
Pestian
Robert Gaizauskas
Roberts
Tanabe
Yikun Guo
Publication venue: 'Elsevier BV'
Publication date: 01/10/2009
Field of study

In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains

Elsevier - Publisher Connector

Crossref

White Rose Research Online

Extracting adverse drug reactions and their context using sequence labelling ensembles in TAC2017

Author: Belousov Maksim
Dixon William
Milosevic Nikola
Nenadic Goran
Publication venue
Publication date: 01/01/2018
Field of study

Adverse drug reactions (ADRs) are unwanted or harmful effects experienced after the administration of a certain drug or a combination of drugs, presenting a challenge for drug development and drug administration. In this paper, we present a set of taggers for extracting adverse drug reactions and related entities, including factors, severity, negations, drug class and animal. The systems used a mix of rule-based, machine learning (CRF) and deep learning (BLSTM with word2vec embeddings) methodologies in order to annotate the data. The systems were submitted to adverse drug reaction shared task, organised during Text Analytics Conference in 2017 by National Institute for Standards and Technology, archiving F1-scores of 76.00 and 75.61 respectively.Comment: Paper describing submission for TAC ADR shared tas

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository